日期:2014-05-16 浏览次数:20580 次
Oracle Hash join 是一种非常高效的join 算法,主要以CPU(hash计算)和内存空间(创建hash table)为代价获得最大的效率。Hash join一般用于大表和小表之间的连接,我们将小表构建到内存中,称为Hash cluster,大表称为probe表。
?
当两个表做hash join时,oracle会选择一个表作为驱动表,先根据过滤条件排除不必要的数据,然后将结果集做成hash表,放入进程的hash area,接着扫描第二张表,将记录的join字段值做hash运算,到内存的hash表里面去探测,如果探测成功,就返回数据,否则这行就丢弃掉。
?
(miki西游 @mikixiyou 原文链接: http://mikixiyou.iteye.com/blog/1709321 )
?
?
select /*+use_nl(a b)*/ a.*,b.* from dba_obj a,all_obj b where a.object_id=b.object_id and a.object_name like 'tt%' SELECT STATEMENT, GOAL = ALL_ROWS NESTED LOOPS TABLE ACCESS FULL SCOTT DBA_OBJ TABLE ACCESS BY INDEX ROWID SCOTT ALL_OBJ INDEX UNIQUE SCAN SCOTT PK_ALL_OBJ?
执行计划解读
两个表使用了嵌套循环连接。首先访问dba_obj表,得到全部记录。然后按照此表记录依次去扫描all_obj表,扫描过程走索引快速得到all_obj的记录。
?
?
?
select /*+use_hash(a b)*/ a.*,b.* from dba_obj a,all_obj b where a.object_id=b.object_id and a.object_name like 'tt%' SELECT STATEMENT, GOAL = ALL_ROWS HASH JOIN TABLE ACCESS FULL SCOTT DBA_OBJ TABLE ACCESS FULL SCOTT ALL_OBJ?
?
执行计划解读
两个表使用了哈希连接。首先访问dba_obj表,得到全部记录,进行hash运算,放到内存hash area中形成hash table,也称为hash cluster。
然后,再腾出手来,全面扫描all_obj表,每扫描到一条记录时,将join字段进行hash运算,然后到hash area中去找与dba_obj表匹配的记录。
这个行为称为probe,中文称探测。此表也称为probe表。
?
?
hash table表是保存在hash area内存区域中,而这个区域在oracle中是分配在pga中。
PGA 包括 进程内存、UGA、sort area,bitmap merge area和hash area。UGA包含session状态信息和private sql area。
?
?
使用这个10104 event可以分析hash area的内存分配情况。
?
alter system set events '10104 trace name context forever,level 2';
?
select count(*)
? from (select /*+use_hash(i g) leading(i)*/
???????? i.*, g.*
????????? from tdividenddetail i, tproductinfo g
???????? where i.c_fundcode = g.fundcode
?????????? and i.d_cdate > sysdate - 100);
?
?
使用use_hash提示强制让两个表采用hash join关联,然后使用leading提示强制让i表作为驱动表。
?
在hash area中,默认采用8个partition,每个partition保存若干个 hash table的记录。这些记录又以bucket逻辑结构存储。
分析trc文件内容如下所示:
?
*** RowSrcId: 1 HASH JOIN STATISTICS (INITIALIZATION) *** Join Type: INNER join Original hash-area size: 3064559 Memory for slot table: 2826240 Calculated overhead for partitions and row/slot managers: 238319 Hash-join fanout: 8 Number of partitions: 8 Number of slots: 23 Multiblock IO: 15 Block size(KB): 8 Cluster (slot) size(KB): 120 Minimum number of bytes per block: 8160 Bit vector memory allocation(KB): 128 Per partition bit vector length(KB): 16 Maximum possible row length: 1708 Estimated build size (KB): 0 Estimated Build Row Length (includes overhead): 408 # Immutable Flags: Not BUFFER(execution) output of the join for PQ Evaluate Left Input Row Vector Evaluate Right Input Row Vector # Mutable Flags: IO sync kxhfSetPhase: phase=BUILD kxhfAddChunk: add chunk 0 (sz=32) to slot table kxhfAddChunk: chunk 0 (lbs=0x2b4b26b47b20, slotTab=0x2b4b26b47ce8) successfuly added kxhfSetPhase: phase=PROBE_1 qerhjFetch: max build row length (mbl=390) *** RowSrcId: 1 END OF BUILD (PHASE 1) *** Revised row length: 370 Revised build size: 9KB kxhfResize(enter): resize to 12 slots (numAlloc=7, max=23) kxhfResize(exit): resized to 12 slots (numAlloc=7, max=12) Slot table resized: old=23 wanted=12 got=12 unload=0 *** RowSrcId: 1 HASH JOIN RESIZE BUILD (PHASE 1) *** Total number of partitions: 8 Number of partitions which could fit in memory: 8 Number of partitions left in memory: 8 Total number of slots in in-memory partitions: 7 kxhfResize(enter): resize to 13 slots (numAlloc=7, max=12) kxhfResize(exit): resized to 13 slots (numAlloc=7, max=13) set work area size to: 1753K (13 slots) *** RowSrcId: 1 HASH JOIN BUILD HASH TABLE (PHASE 1) *** Total number of partitions: 8 Number of partitions left in memory: 8 Total number