日期:2014-05-16  浏览次数:20801 次

MySQL Infobright-数据仓库笔记[转]
http://www.fhand.com/blog/archives/tag/brighthouse
由于项目中的登录log一直比较大,目前是每周切分一张表,每次做月季度数据分析的时候就很痛苦,今天特定请教了一下公司DBA的同学,学到了两个解决方法。一个是把每天的登录按照位运算的形式保存,查询的时候使用bit_count(field)来计算,速度会大大提升,这个以后在项目中实践了再来分享;另一个就是使用Infobright引擎,Infobright是一个列存数据仓库软件,可以与MySQL集成,作为MySQL的一个存储引擎来使用。详细的框架结构可以参考官方的白皮书,国内也有介绍的,比如这一篇文章。

Infobright是一个与MySQL集成的开源数据仓库(Data Warehouse)软件,可作为MySQL的一个存储引擎来使用,SELECT查询与普通MySQL无区别。

一、Infobright的基本特征:
优点:
查询性能高:百万、千万、亿级记录数条件下,同等的SELECT查询语句,速度比MyISAM、InnoDB等普通的MySQL存储引擎快5~60倍
存储数据量大:TB级数据大小,几十亿条记录
高压缩比:在我们的项目中为23:1,极大地节省了数据存储空间
基于列存储:无需建索引,无需分区
适合复杂的分析性SQL查询:SUM, COUNT, AVG, GROUP BY

限制:
不支持数据更新:社区版Infobright只能使用“LOAD DATA INFILE”的方式导入数据,不支持INSERT、UPDATE、DELETE
不支持高并发:只能支持10多个并发查询。

安装可以参考这篇文章 MySQL Infobright 数据仓库快速安装笔记[原创]。不过最新版已经没有64位的源码,我们可以直接下载rpm或者deb版本进行安装。

1. Download the install package (e.g. infobright-3.4-x86_64.rpm) to the server where you are installing Infobright

2. Obtain root user access

3. To install the RPM package, run:

rpm -ivh infobright_version_name.rpm [optional: --prefix=path]

To install the DEB package, run:

dpkg -i infobright_version_name.deb

Note: Please do not install ICE in the root or home directories due to possible MySQL permission checking issues during install, start up, and/or load. [需要注意会有mysql的权限问题,所以安装的目录需要chown mysql:mysql授予访问权限]

4. To change the default install options, after installation run:

/usr/local/infobright/postconfig.sh

You can run this script at any time after installation to change the datadir, CacheFolder, socket, and port. The script must be run as root and ICE must not be running. 【需要在infobright停止运行的时候再修改目录相关,该脚本需要在安装目录下运行,所以需要承 cd /usr/local/infobright/



5. The installation determines the optimum memory settings based on the physical memory of the system. You may change these settings by editing the file brighthouse.ini within the data directory.

Important: The memory settings assume that there are no other services on the machine consuming significant memory. If this is not the case, please lower the memory settings for Infobright.

6. To start or stop ICE, run:

/etc/init.d/mysqld-ib start
/etc/init.d/mysqld-ib stop

7. To connect to ICE, use the script mysql-ib:

/usr/bin/mysql-ib [optional:db_name]

8. To uninstall ICE, run either:

rpm -e infobright
dpkg -r infobright

9. 示例:从普通的MySQL数据库(假设MySQL安装路径为/usr/local/webserver/mysql)导出数据到csv文件:

/usr/local/webserver/mysql/bin/mysql  -S /tmp/mysql3306.sock -D tongji_logs -e "select * from  log_visits_2010_05_10 into outfile '/data0/test.csv' FIELDS TERMINATED  BY ',' ENCLOSED BY '"'  ESCAPED BY '\' LINES TERMINATED BY 'n';"
10.示例:普通MySQL和Infobright建表对比

①、普通MySQL的InnoDB存储引擎建表:
CREATE TABLE IF NOT EXISTS `log_visits_2010_05_12` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`cate_id` int(11) NOT NULL,
`site_id` int(11) unsigned NOT NULL,
`visitor_localtime` char(8) NOT NULL,
`visitor_idcookie` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `cate_site_id` (`cate_id`,`site_id`),
KEY `visitor_localtime` (`visitor_localtime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
②、Infobright的BRIGHTHOUSE存储引擎建表:
CREATE TABLE IF NOT EXISTS `log_visits` (
`id` int(11) NOT NULL,
`cate_id` int(11) NOT NULL,
`site_id` int(11) NOT NULL,
`visitor_localtime` char(8) NOT NULL,
`visitor_idcookie` varchar(255) NOT NULL,
) ENGINE=BRIGHTHOUSE DEFAULT CHARSET=utf8;
注:BRIGHTHOUSE存储引擎建表时不能有AUTO_INCREMENT自增、unsigned无符号、unique唯一、主键PRIMARY KEY、索引KEY。
11、示例:从csv文件导入数据到Infobright数据仓库:

/usr/local/infobright/bin/mysql  -S /tmp/mysql3307.sock -D dw --skip-column-names -e "LOAD