日期:2014-05-16  浏览次数:20516 次

项目大数据表分表过程
有一个项目运行了一段时间之后,数据越来越大,有几张表数据达到四千多万,这个时候就考虑对这些大数据表进行分表来加快数据的操作,OK,寻找可以作为分表的KEY,最后找到了一个deviceId码(包含IMEI、MEID和ESN)这个码有个规律就是由数字和字母组合而成,原先想去deviceId的前六位进行加法运算得到一个数字作为表的分别值,后来经过验证发现这三个码是有规律的,前面都各自有代表的意思,所以导致数据分出来之后有些表的数据多有些少,达不到预期想要的结果,只能考虑另外一种方法,最后决定用deviceId码的最后一位来作为分表的依据,因为最后一位是随机码来的,所以分布比较均匀。



200多万的测试数据,结果还是比较满意的。
找到了规律就可以开始分表了,这时候另一个问题迎面而来就是四千多万的数据怎么分到各个表里面去呢,这个可不是一件小事,而且项目在运行着。
方法1:
写了一个存储过程想一次想把一张表分离好,理论上是没有问题的
BEGIN
	
	DECLARE v,num,total INT;

	SET num=100000;
	SET total=40000000;
	SET v=1;
	WHILE v<=CEILING(total/num) DO
		CALL integral_log_analysis((v-1)*num+1,v*num);
		SET v=v+1;
	END WHILE;

END

integral_log_analysis这个存储过程是做分表操作,以为这样就可以很快完成,后来才知道这方法行不通,在本地运行两千万的数据一个晚上才执行完毕,这样是不可能让项目停这么久的时间的。

方法2:
考虑把数据导出来,然后在本地分离好再上传到服务器,这样项目就不用停那么久了。那首先就是把数据导出来,9G的数据导出来可不是一件容易的事,
mysqldump -uroot -pdbpasswd dbname test>db.sql;
这种方法可以导出一个表的结构和数据,但是导入到本地的却要很长的时间,执行了一个晚上都没执行完,最后只能放弃了。

方法3:
把数据按照预定的规则导出成文本,然后再导入到表,这种方法最后得到了验证,而且速度之快,大概半个小时就可以执行完毕。
Select userId,imei,fid,integral,0,type,insertTime From `integral_log` where lower(RIGHT(imei,1))='1' or lower(RIGHT(imei,1))='a' or lower(RIGHT(imei,1))='k' or lower(RIGHT(imei,1))='u' Into OutFile '/var/lib/mysql/integral_log_s1.txt';
Select userId,imei,fid,integral,0,type,insertTime From `integral_log` where lower(RIGHT(imei,1))='2' or lower(RIGHT(imei,1))='b' or lower(RIGHT(imei,1))='l' or lower(RIGHT(imei,1))='v' Into OutFile '/var/lib/mysql/integral_log_s2.txt';
Select userId,imei,fid,integral,0,type,insertTime From `integral_log` where lower(RIGHT(imei,1))='3' or lower(RIGHT(imei,1))='c' or lower(RIGHT(imei,1))='m' or lower(RIGHT(imei,1))='w' Into OutFile '/var/lib/mysql/integral_log_s3.txt';
Select userId,imei,fid,integral,0,type,insertTime From `integral_log` where lower(RIGHT(imei,1))='4' or lower(RIGHT(imei,1))='d' or lower(RIGHT(imei,1))='n' or lower(RIGHT(imei,1))='x' Into OutFile '/var/lib/mysql/integral_log_s4.txt';
Select userId,imei,fid,integral,0,type,insertTime From `integral_log` where lower(RIGHT(imei,1))='5' or lower(RIGHT(imei,1))='e' or lower(RIGHT(imei,1))='o' or lower(RIGHT(imei,1))='y' Into OutFile '/var/lib/mysql/integral_log_s5.txt';
Select userId,imei,fid,integral,0,type,insertTime From `integral_log` where lower(RIGHT(imei,1))='6' or lower(RIGHT(imei,1))='f' or lower(RIGHT(imei,1))='p' or lower(RIGHT(imei,1))='z' Into OutFile '/var/lib/mysql/integral_log_s6.txt';
Select userId,imei,fid,integral,0,type,insertTime From `integral_log` where lower(RIGHT(imei,1))='7' or lower(RIGHT(imei,1))='g' or lower(RIGHT(imei,1))='q' Into OutFile '/var/lib/mysql/integral_log_s7.txt';
Select userId,imei,fid,integral,0,type,insertTime From `integral_log` where lower(RIGHT(imei,1))='8' or lower(RIGHT(imei,1))='h' or lower(RIGHT(imei,1))='r' Into OutFile '/var/lib/mysql/integral_log_s8.txt';
Select userId,imei,fid,integral,0,type,insertTime From `integral_log` where lower(RIGHT(imei,1))='9' or lower(RIGHT(imei,1))='i' or lower(RIGHT(imei,1))='s' Into OutFile '/var/lib/mysql/integral_log_s9.txt';
Select userId,imei,fid,integral,0,type,insertTime From `integral_log` where lower(RIGHT(imei,1))='0' or lower(RIGHT(imei,1))='j' or lower(RIGHT(imei,1))='t' Into OutFile '/var/lib/mysql/integral_log_s0.txt';




load data local infile "/var/lib/mysql/integral_log_s0.txt" ignore into table integral_log_s0(userId,imei,fid,integral,activation,type,insertTime);
load data local infile "/var/lib/mysql/integral_log_s1.txt" ignore into table integral_log_s1(userId,imei,fid,integral,activation,type,insertTime);
load data local infile "/var/lib/mysql/integral_log_s2.txt" ignore into table integral_log_s2(userId,imei,fid,integral,activation,type,insertTime);
load data local infile "/var/lib/mysql/integral_log_s3.txt" ignore into table integral_log_s3(userId,im