日期:2014-05-16  浏览次数:20382 次

hive partition字段异常字符导致的问题

线上一个任务报错,报错内容:
Failed with exception javax.jdo.JDODataStoreException: Error executing JDOQL query "SELECT 'org.apache.hadoop.hive.metastore.model.MPartition' AS NUCLEUS_TYPE,`THIS`.`CREATE_TIME`,`THIS`.`LAST_ACCESS_TIME`,`THIS`.`PART_NAME`,`THIS`.`PART_ID` FROM `PARTITIONS` `THIS` LEFT OUTER JOIN `TBLS` `THIS_TABLE_TABLE_NAME` ON `THIS`.`TBL_ID` = `THIS_TABLE_TABLE_NAME`.`TBL_ID` LEFT OUTER JOIN `TBLS` `THIS_TABLE_DATABASE` ON `THIS`.`TBL_ID` = `THIS_TABLE_DATABASE`.`TBL_ID` LEFT OUTER JOIN `DBS` `THIS_TABLE_DATABASE_DATABASE_NAME` ON `THIS_TABLE_DATABASE`.`DB_ID` = `THIS_TABLE_DATABASE_DATABASE_NAME`.`DB_ID` WHERE `THIS_TABLE_TABLE_NAME`.`TBL_NAME` = ? AND `THIS_TABLE_DATABASE_DATABASE_NAME`.`NAME` = ? AND `THIS`.`PART_NAME` = ?" : Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='.

NestedThrowables:
java.sql.SQLException: Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
2012-11-22 13:23:20 [dpdw_traffic_base.sh] run failed

请教bupt04406(http://bupt04406.iteye.com/)同学后了解到是由于partition字段异常字符导致,把partition字段所有内容找出,发现有:www.dianpijg.com?photos
在ETL中过滤掉这种字符后,顺利运行


ETL中对partiton字段做下清洗看来是有必要的