日期:2014-05-16  浏览次数:20484 次

Hive HBase整合使用

Hive HBase整合使用

将Hive中的数据导入到HBase中

wiki上的介绍链接如下:

http://wiki.apache.org/hadoop/Hive/HBaseIntegration

?

各版本要求:

hadoop-0.20.2

hbase-0.20.3

hive-0.6.0(之前的版本都不支持)

zookeeper-3.3.0

?

确认目录下有这个jar包:

./hive-0.6.0/lib/hive_hbase-handler.jar

?

?

加载这个jar包启动hive:

单个hbase服务器的启动命令如下:

bin/hive --auxpath /home/hadoopuser/hive-0.6.0/lib/hive_hbase-handler.jar,/home/hadoopuser/hive-0.6.0/lib/hbase-0.20.3.jar -hiveconf hbase.master=master:60000

如果使用zookeeper为hbase集群选择一个master

bin/hive --auxpath /home/hadoopuser/hive-0.6.0/lib/hive_hbase-handler.jar,/home/hadoopuser/hive-0.6.0/lib/hbase-0.20.3.jar -hiveconf hbase.zookeeper.quorum=server1,server2,server3

?

举例子 :

创建HBase

CREATE TABLE hbase_table_1(key int, value string)?

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")

TBLPROPERTIES ("hbase.table.name" = "xyz");

?

hbase.table.name这项说明,在Hive这个表叫做 hbase_table_1 ,在HBase中这个表叫做 xyz。

hive> describe hbase_table_1;

OK

key ? ? int ? ? from deserializer

value ? string ?from deserializer

Time taken: 0.225 seconds

?

hbase(main):001:0> describe 'xyz'

DESCRIPTION ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ENABLED

?{NAME => 'xyz', FAMILIES => [{NAME => 'cf1', COMPRESSION => 'NONE', VE true

?RSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>

??'false', BLOCKCACHE => 'true'}]}

1 row(s) in 0.1040 seconds

hbase(main):002:0>

?

?

CREATE TABLE pokes (foo INT, bar STRING); ?

LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;?

?

INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=86;

?

?

?

1 楼 czwq2001 2011-08-04  
我带hbase执行的时候出现以下异常,不带hbase是正常的
hive> select * from pokes where 1=1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
java.lang.IllegalArgumentException: Can not create a Path from an empty string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
        at org.apache.hadoop.fs.Path.<init>(Path.java:90)
        at org.apache.hadoop.fs.Path.<init>(Path.java:50)
        at org.apache.hadoop.mapred.JobClient.copyRemoteFiles(JobClient.java:554)
        at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:637)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:805)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:775)
        at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:688)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:633)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:506)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:384)
    &