日期:2014-05-16  浏览次数:21064 次

Hive导出到Mysql中中文乱码的问题

在上一篇文章《从hive将数据导出到mysql》中,虽然通过hive中转,将hbase的数据成功导出到了mysql中,但是我们遇到了中文乱码问题。

一、mysql中的编码

mysql> show variables like 'collation_%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
mysql> show variables like 'character_set_%';
+--------------------------+----------------------------+

| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

?可见原来缺省是latin1编码,会导致中文乱码。

可以在mysql中设置编码,单个设置
mysql> alter database name character set utf8;
mysql> set character_set_connection=utf8;

Query OK, 0 rows affected (0.00 sec)
mysql> set character_set_connection=utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> set character_set_results=utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> set character_set_server=utf8;
Query OK, 0 rows affected (0.00 sec)
但重启后会失效。

?

可以修改配置文件:

[root@Hadoop48 ~]# vi /etc/my.cnf
[mysql]
default-character-set=utf8
[client]
default-character-set=utf8
[mysqld]
default-character-set=utf8
character_set_server=utf8
init_connect='SET NAMES utf8'

?重启mysql,这样确保缺省编码是utf8

[root@Hadoop48 ~]# service mysqld restart

?查看是否变成utf8:

mysql> \s
--------------
mysql Ver 14.12 Di