日期:2014-05-16  浏览次数:20580 次

Oracle表连接之哈希连接

Oracle Hash join 是一种非常高效的join 算法,主要以CPU(hash计算)和内存空间(创建hash table)为代价获得最大的效率。Hash join一般用于大表和小表之间的连接,我们将小表构建到内存中,称为Hash cluster,大表称为probe表。

?

当两个表做hash join时,oracle会选择一个表作为驱动表,先根据过滤条件排除不必要的数据,然后将结果集做成hash表,放入进程的hash area,接着扫描第二张表,将记录的join字段值做hash运算,到内存的hash表里面去探测,如果探测成功,就返回数据,否则这行就丢弃掉。

?

(miki西游 @mikixiyou 原文链接: http://mikixiyou.iteye.com/blog/1709321 )

?

?

select /*+use_nl(a b)*/ a.*,b.* 
from dba_obj a,all_obj b
where a.object_id=b.object_id
and a.object_name like 'tt%'



SELECT STATEMENT, GOAL = ALL_ROWS		
 NESTED LOOPS		
  TABLE ACCESS FULL	SCOTT	DBA_OBJ
  TABLE ACCESS BY INDEX ROWID	SCOTT	ALL_OBJ
   INDEX UNIQUE SCAN	SCOTT	PK_ALL_OBJ
?

执行计划解读

两个表使用了嵌套循环连接。首先访问dba_obj表,得到全部记录。然后按照此表记录依次去扫描all_obj表,扫描过程走索引快速得到all_obj的记录。

?

?

?

select /*+use_hash(a b)*/ a.*,b.* 
from dba_obj a,all_obj b
where a.object_id=b.object_id
and a.object_name like 'tt%'



SELECT STATEMENT, GOAL = ALL_ROWS		
 HASH JOIN		
  TABLE ACCESS FULL	SCOTT	DBA_OBJ
  TABLE ACCESS FULL	SCOTT	ALL_OBJ
?

?

执行计划解读

两个表使用了哈希连接。首先访问dba_obj表,得到全部记录,进行hash运算,放到内存hash area中形成hash table,也称为hash cluster。
然后,再腾出手来,全面扫描all_obj表,每扫描到一条记录时,将join字段进行hash运算,然后到hash area中去找与dba_obj表匹配的记录。
这个行为称为probe,中文称探测。此表也称为probe表。

?

?

hash table表是保存在hash area内存区域中,而这个区域在oracle中是分配在pga中。

PGA 包括 进程内存、UGA、sort area,bitmap merge area和hash area。UGA包含session状态信息和private sql area。

?

?

使用这个10104 event可以分析hash area的内存分配情况。

?

alter system set events '10104 trace name context forever,level 2';
?
select count(*)
? from (select /*+use_hash(i g) leading(i)*/
???????? i.*, g.*
????????? from tdividenddetail i, tproductinfo g
???????? where i.c_fundcode = g.fundcode
?????????? and i.d_cdate > sysdate - 100);

?

?

使用use_hash提示强制让两个表采用hash join关联,然后使用leading提示强制让i表作为驱动表。

?

在hash area中,默认采用8个partition,每个partition保存若干个 hash table的记录。这些记录又以bucket逻辑结构存储。

分析trc文件内容如下所示:

?

*** RowSrcId: 1 HASH JOIN STATISTICS (INITIALIZATION) ***
Join Type: INNER join
Original hash-area size: 3064559
Memory for slot table: 2826240
Calculated overhead for partitions and row/slot managers: 238319
Hash-join fanout: 8
Number of partitions: 8
Number of slots: 23
Multiblock IO: 15
Block size(KB): 8
Cluster (slot) size(KB): 120
Minimum number of bytes per block: 8160
Bit vector memory allocation(KB): 128
Per partition bit vector length(KB): 16
Maximum possible row length: 1708
Estimated build size (KB): 0
Estimated Build Row Length (includes overhead): 408
# Immutable Flags:
  Not BUFFER(execution) output of the join for PQ
  Evaluate Left Input Row Vector
  Evaluate Right Input Row Vector
# Mutable Flags:
  IO sync
kxhfSetPhase: phase=BUILD
kxhfAddChunk: add chunk 0 (sz=32) to slot table
kxhfAddChunk: chunk 0 (lbs=0x2b4b26b47b20, slotTab=0x2b4b26b47ce8) successfuly added
kxhfSetPhase: phase=PROBE_1
qerhjFetch: max build row length (mbl=390)
*** RowSrcId: 1 END OF BUILD (PHASE 1) ***
  Revised row length: 370
  Revised build size: 9KB
kxhfResize(enter): resize to 12 slots (numAlloc=7, max=23)
kxhfResize(exit): resized to 12 slots (numAlloc=7, max=12)
  Slot table resized: old=23 wanted=12 got=12 unload=0
*** RowSrcId: 1 HASH JOIN RESIZE BUILD (PHASE 1) ***
Total number of partitions: 8
Number of partitions which could fit in memory: 8
Number of partitions left in memory: 8
Total number of slots in in-memory partitions: 7
kxhfResize(enter): resize to 13 slots (numAlloc=7, max=12)
kxhfResize(exit): resized to 13 slots (numAlloc=7, max=13)
  set work area size to: 1753K (13 slots)
*** RowSrcId: 1 HASH JOIN BUILD HASH TABLE (PHASE 1) ***
Total number of partitions: 8
Number of partitions left in memory: 8
Total number