Hive Join 优化通译-数据库教程-爱易网页

Hive Join 优化通译

日期：2014-05-16　浏览次数：20673 次

Hive Join 优化翻译

翻译自?https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-AutoConversiontoSMBMapJoin

目录结构

Join Optimization ----Join 调优
	Improvements to the Hive Optimizer ----Hive的优化
	Star Join Optimization ----星型结构的优化
		Star Schema Example ----例子
		Prior Support for MAPJOIN  ----MapJoin的预备知识
			Limitations of Prior Implementation  ----使用限制
		Enhancements for Star Joins  ----优化星型join
			Optimize Chains of Map Joins  ----优化一连串的Map join
				Current and Future Optimizations ----当前以及未来的优化方向
			Optimize Auto Join Conversion ----自动join转换
				Current Optimization ----当前的优化方向
				Auto Conversion to SMB Map Join ----自动转换成SMS join
			Generate Hash Tables on the Task Side ----在task side生成hash表
				Pros and Cons of Client-Side Hash Tables ----优缺点
				Task-Side Generation of Hash Tables ----在task siade 生成hash表
					Further Options for Optimization----优化方向

Improvements to the Hive Optimizer

Hive automatically recognizes various use cases and optimizes for them. Hive 0.11 improves the optimizer for these cases: ? hive可以自动优化，在0.11里面改进了一些优化用例

Joins where one side fits in memory. In the new optimization: ?join的一边适合放进内存，有新的优化方案
- that side is loaded into memory as a hash table?把表按照hash表的形式读进内存
- only the larger table needs to be scanned?只扫描大表
- fact tables have a smaller footprint in memory?fact表只使用少量内存
Star-schema joins 星型join
Hints are no longer needed for many cases. 在很多情况下不再需要HINT
Map joins are automatically picked up by the optimizer. Map join 自动优化

Star Join Optimization

Star Schema Example

Select count(*) cnt
From store_sales ss
     join household_demographics hd on (ss.ss_hdemo_sk = hd.hd_demo_sk)
     join time_dim t on (ss.ss_sold_time_sk = t.t_time_sk)
     join store s on (s.s_store_sk = ss.ss_store_sk)
Where
     t.t_hour = 8
     t.t_minute >= 30
     hd.hd_dep_count = 2
order by cnt;

DW 常用的star schema，这个属于BI 基本概念了就不解释了。

Prior Support for MAPJOIN

The default value for hive.auto.convert.join was false in Hive 0.10.0. ?Hive 0.11.0 changed the default to true (HIVE-3297).hive.auto.convert.join 在0.10默认是fales，到了0.11就是变成了true

MAPJOINs are processed by loading the smaller table into an in-memory hash map and matching keys with the larger table as they are streamed through. The prior implementation has this division of labor:

MAPJOINs ?把小表以hash map的形式读进内存，然后和大表匹配key，以下是各个阶段的分工

Local work:本地
- read records via standard ta

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

Hive Join 优化通译

相关资料更多>

推荐阅读更多>

Hive Join 优化 通译

相关资料更多>

推荐阅读更多>

Hive Join 优化通译