Sort merge joinsql
排序合併總結:session
1 一般狀況下,排序合併鏈接的效率遠不如hash join,前者適用範圍更廣,hj只使用於等值鏈接,smj範圍更廣(<,>,>=,<=)oracle
2 一般狀況下,smj並不適合OLTP系統,排序操做是很是昂貴的操做,ide
優化器會根據必定的規則來肯定表T1,T2誰是驅動表,誰是被驅動表,驅動表用於外層循環,被驅動表用於內存循環,這裏假設驅動表時T1,被驅動表時T2oop
目標sql中指定的謂詞條件去訪問T1,獲得的結果集爲1性能
而後遍歷驅動結果集1同時遍歷被驅動表T2,即先取出1中的第一條記錄,接着遍歷T2並按照條件去判斷T2中是否存在配匹的記錄,而後在取出1中的第二條記錄。。。。優化
嵌套循環總結:this
1 若是t1對應的驅動結果集較少,同時t2的鏈接列上又有惟一性索引,則效率會很高spa
2 只要驅動結果集不多就具有嵌套循環的前提條件3d
3 嵌套循環能夠實現快速響應,便可以第一時間返回通過鏈接且知足條件的記錄,而沒必要等待全部的鏈接操做所有作完才返回鏈接結果
若是使用了nl鏈接,而且t2的鏈接列上index,那麼oracle訪問該index是一般會使用單塊讀,則t1的返回n條結果,就會是t2訪問該index n次,若是要回表,
則會回表n次,這就使得不在index 或者data buffer cache中的數據,發生物理I/O,
Oracle 11g使用了向量I/O,提升nl的鏈接效率
nested loop
outer table --驅動表
inner table
The second picture, shown in Figure 11-2, includes a representation of working through
an index on the second table, because an index is usually involved in this way when there is a
nested loop around.
例 create table t1 (col1 number, col2 varchar2(1)); create table t2 (col2 varchar2(1), col3 varchar2(2)); insert into t1 values(1,'A'); insert into t1 values(2,'B'); insert into t1 values(3,'C'); insert into t2 values('A','A1'); insert into t2 values('B','B1'); insert into t2 values('D','D1'); Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Data Mining options SQL> set linesize 1000 SQL> set pagesize 1000 SQL> set timing on SQL> set autot trace only SP2-0158: unknown SET option "only" SQL> set autotrace traceonly; SQL> select t1.col1,t1.col2,t2.col3 2 from t1,t2 3 where t1.col2=t2.col2; Elapsed: 00:00:00.04 Execution Plan ---------------------------------------------------------- Plan hash value: 2253255382 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 60 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 5 | 1 (0)| 00:00:01 | | 2 | NESTED LOOPS | | 3 | 60 | 4 (0)| 00:00:01 | | 3 | TABLE ACCESS FULL | T1 | 3 | 45 | 3 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | IDX_T2 | 1 | | 0 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."COL2"="T2"."COL2") Note ----- - dynamic sampling used for this statement Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 13 consistent gets 0 physical reads 0 redo size 469 bytes sent via SQL*Net to client 337 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2 rows processed
哈希鏈接是兩個表在作鏈接時只要依靠哈希運算來獲得結果集(僅適合CBO),在解析目標sql 時是否考慮哈希鏈接受限制與隱含參數(_HASH_JOIN_ENABLED),
默認值TRUE,若是值爲false,強制使用hint,也是會走hj的
1 oracle會根據參數HASH_AREA_SIZE,DB_BLOCK_SIZE,_HASH_MULTIBLOCK_IO_COUNT來決定hash partition的數量,全部hash partition的集合稱爲Hash table,
2 表t1,t2在目標sql中的謂詞條件後,獲得結果集中的數據量較少的那個結果集會被oracle選爲哈希鏈接的驅動結果集,假設t1的結果集1較少(驅動結果集),t2的結果2(被驅動結果集)
3 oracle會遍歷結果集1,讀取1中的每一條記錄,並對每一條記錄按照該記錄t1中的鏈接列作哈希運算,
--小表在數據在指定謂詞後作哈希運算放入pga中(超過放入temp),大表數據按照鏈接列作哈希運算,而後大表去配匹pga中的值,遍歷完爲止
哈希鏈接的優缺點:
1 哈希鏈接不必定會排序,大多數狀況下不須要排序
2 哈希鏈接的驅動表所對應的鏈接列的可選擇性儘量的好,會影響hash bucket中的記錄數,哈希鏈接中,遍歷hash bucket的動做發生在pga工做區中,不消耗邏輯讀,
3 哈希鏈接適用於CBO,等值鏈接
4 哈希鏈接適合大表跟小表的鏈接,2個表作哈希鏈接,在指定了謂詞後的sql中獲得的數量較少的結果集所對應的hash table能徹底容納在pga中,則效率會很高。
SQL> select /*+ leading (t1) use_hash(t2) */ 2 t1.col1,t1.col2,t2.col3 3 from t1,t2 4 where t1.col2=t2.col2; Elapsed: 00:00:00.25 Execution Plan ---------------------------------------------------------- Plan hash value: 1838229974 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 60 | 7 (15)| 00:00:01 | |* 1 | HASH JOIN | | 3 | 60 | 7 (15)| 00:00:01 | | 2 | TABLE ACCESS FULL| T1 | 3 | 45 | 3 (0)| 00:00:01 | | 3 | TABLE ACCESS FULL| T2 | 3 | 15 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."COL2"="T2"."COL2") Note ----- - dynamic sampling used for this statement Statistics ---------------------------------------------------------- 7 recursive calls 0 db block gets 32 consistent gets 0 physical reads 0 redo size 469 bytes sent via SQL*Net to client 337 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 2 rows processed
2個表在作鏈接是,沒有指定任何鏈接條件的鏈接
SQL> select 2 t1.col1,t1.col2,t2.col3 3 from t1,t2; 9 rows selected. Elapsed: 00:00:00.03 Execution Plan ---------------------------------------------------------- Plan hash value: 787647388 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9 | 162 | 8 (0)| 00:00:01 | | 1 | MERGE JOIN CARTESIAN| | 9 | 162 | 8 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL | T1 | 3 | 45 | 3 (0)| 00:00:01 | | 3 | BUFFER SORT | | 3 | 9 | 5 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL | T2 | 3 | 9 | 2 (0)| 00:00:01 | -----------------------------------------------------------------------------
作子查詢展開時,oracle會常常把那些外部where條件爲 no exists,not in ,<>all的子查詢轉換成對應的反鏈接
SQL> select * from t1 2 where t1.col2 not in (select col2 from t2); Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 895956251 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 15 | 5 (0)| 00:00:01 | |* 1 | FILTER | | | | | | | 2 | TABLE ACCESS FULL| T1 | 3 | 45 | 3 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL| T2 | 3 | 6 | 2 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "T2" "T2" WHERE LNNVL("COL2"<>:B1))) 3 - filter(LNNVL("COL2"<>:B1)) SQL> select * from t1 2 where not exists (select 1 from t2 where t1.col2=t2.col2); Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 1534930707 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 17 | 3 (0)| 00:00:01 | | 1 | NESTED LOOPS ANTI | | 1 | 17 | 3 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL| T1 | 3 | 45 | 3 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | IDX_T2 | 3 | 6 | 0 (0)| 00:00:01 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."COL2"="T2"."COL2") alter session set "_optimizer_null_aware_antijoin"=false
半鏈接跟普通的鏈接不一樣,半鏈接會去重?
對子查詢展開,exists,in等
SQL> select * from t1 2 where t1.col2 in (select col2 from t2); Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 3783859632 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 51 | 3 (0)| 00:00:01 | | 1 | NESTED LOOPS SEMI | | 3 | 51 | 3 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL| T1 | 3 | 45 | 3 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | IDX_T2 | 3 | 6 | 0 (0)| 00:00:01 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."COL2"="COL2") SQL> select * from t1 2 where exists (select 1 from t2 where t1.col2=t2.col2); Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 3783859632 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 51 | 3 (0)| 00:00:01 | | 1 | NESTED LOOPS SEMI | | 3 | 51 | 3 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL| T1 | 3 | 45 | 3 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | IDX_T2 | 3 | 6 | 0 (0)| 00:00:01 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."COL2"="T2"."COL2")
A)排序合併鏈接(Sort Merge Join, SMJ):
a) 對於非等值鏈接,這種鏈接方式的效率是比較高的。
b) 若是在關聯的列上都有索引,效果更好。
c) 對於將2個較大的表源作鏈接,該鏈接方法比NL鏈接要好一些。
B)嵌套循環(Nested Loops, NL):
a) 若是驅動表(外部表)比較小,而且在被驅動表(內部表)上有惟一索引,或有高選擇性非惟一索引時,使用這種方法能夠獲得較好的效率。
b)嵌套循環鏈接有其它鏈接方法沒有的的一個優勢是:能夠先返回已經鏈接的行,而沒必要等待全部的鏈接操做處理完才返回數據,這能夠實現快速的響應時間。
C)哈希鏈接(Hash Join, HJ):
a) 這種方法是在oracle7後來引入的,使用了比較先進的鏈接理論,通常來講,其效率應該好於其它2種鏈接,可是這種鏈接只能用在CBO優化器中,
並且須要設置合適的hash_area_size參數,才能取得較好的性能。
b) 在2個較大的表源之間鏈接時會取得相對較好的效率,在一個表源較小時則能取得更好的效率。
c) 只能用於等值鏈接中