mysql分块批量插入数据(insert)性能分析

93 阅读 0 评论 62 点赞

我是靠谱客的博主彪壮大地，最近开发中收集的这篇文章主要介绍mysql分块批量插入数据(insert)性能分析，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

一次插入多条数据时，可以使用 insert into table values (v1), (v2) ... (vn) 语句，这样可以避免程序和数据库建立多次连接，从而减少服务器运行时间。

实验设计：

数据库	mysql
数据库地址	本机
数据总数	10万
程序语言	Python

数据库设计：

代码：

#encoding=utf8
import pymysql
import string
import random
import time

def get_ran_str():
    return ''.join(random.sample(string.ascii_letters + string.digits, 16))

def get_values_exp(count):
    exp = []
    for i in range(0, count):
        name = get_ran_str()
        key = get_ran_str()
        value = get_ran_str()
        exp.append(f"('{name}', '{key}', '{value}')")
    return ",".join(exp)

if __name__ == '__main__':
    conn = pymysql.connect('localhost', '', '', 'db')
    cursor = conn.cursor()
    arr_count = [1,2,5,10,20,50,100,200,500,1000,2000,5000,10000,20000,50000]
    for count in arr_count:
        sql = "delete from test_insert;"
        cursor.execute(sql)
        sql = 'alter table test_insert auto_increment=1;'
        cursor.execute(sql)
        conn.commit()
        num = 100000
        begin_time = time.time()
        for i in range(0, num//count):
            sql = "insert into test_insert(`name`,`key`,`value`) values " + get_values_exp(count)
            cursor.execute(sql)
            conn.commit()
        end_time = time.time()
        run_time = end_time - begin_time
        print(f'每块{count}条数据的运行时间：{run_time}秒')

实验结果：

每块数据条数	运行时间
1	625.0080001354218
2	324.8550000190735
5	146.0849997997284
10	85.9079999923706
20	53.58800005912781
50	32.60899996757507
100	25.304999828338623
200	21.836999893188477
500	17.63699984550476
1000	11.73800015449524
2000	14.864000082015991
5000	14.25499963760376
10000	11.13699984550476
20000	10.92199969291687
50000	10.728999853134155
100000	异常

实验分析：

运行时间随块大小的增大先急剧减少，然后趋于稳定

数据分块大小不能无限大，当块大小为10万时，程序就会抛出异常：

使用事务进行插入处理能够提高数据插入效率：

for i in range(0, num//count):
    sql = "insert into test_insert(`name`,`key`,`value`) values " + get_values_exp(count)
    cursor.execute(sql)
conn.commit()

实验结果如下：

每块数据条数	运行时间
1	28.40499997138977
2	14.674999713897705
5	10.174000263214111
10	8.79200005531311
20	8.255000114440918
50	7.776000022888184
100	7.33899998664856
200	7.374999761581421
500	7.142000198364258
1000	7.124000072479248
2000	7.20799994468689
5000	7.055999994277954
10000	7.039999961853027
20000	7.091000080108643
50000	7.04800009727478