Overblog
Editer l'article Suivre ce blog Administration + Créer mon blog
18 août 2009 2 18 /08 /août /2009 16:19
Suppose you simply LEFT OUTER JOIN a master table to its detail table, and then aggregate the result on the master's primary key.
You would expect the output cardinality to be exactly the number of lines in the master table, wouldn't you ?
Of course, the result of such a query in Oracle 9iR2 (and i hope in the other versions too) is correct... but the cardinality estimated by the CBO is not !
In fact, it is not the only strange behaviour : if you materialize the query using the WITH clause, you get two different computations of the cardinality.
This article's aim is to illustrate it (without solving the CBO's mysteries...)

First create a master table as the 10 first integers, define the primary key, and compute the statistics.
CREATE TABLE tmp1 AS
SELECT level as id
FROM DUAL
CONNECT BY level <= 10
/
ALTER TABLE tmp1 ADD CONSTRAINT tmp1_pk PRIMARY KEY (id);

exec dbms_stats.gather_table_stats(ownname => 'PACMAN', tabname => 'TMP1', cascade => true)


Then, create detail table as the association of 10 numbers to each of the 10 lines of the master table, define PK and FK :
CREATE TABLE tmp2 AS
SELECT (a.id - 1) * 10 + b.id as id, a.id as fk
FROM tmp1 a CROSS JOIN tmp1 b
/

ALTER TABLE tmp2 ADD CONSTRAINT tmp2_pk PRIMARY KEY (id)
/

ALTER TABLE tmp2 ADD CONSTRAINT tmp2_ifk FOREIGN KEY (fk) REFERENCES tmp1(id)
/


Reassign some lines to another master line and compute statistics :
UPDATE tmp2
SET fk = 10
WHERE fk BETWEEN 7 AND 9;

exec dbms_stats.gather_table_stats(ownname => 'PACMAN', tabname => 'TMP2', cascade => true)


Now explain the query that counts for each master line (in tmp1) the associated detail lines (in tmp2) :

EXPLAIN PLAN FOR
SELECT a.id, count(b.fk) as cnt
FROM tmp1 a
  LEFT OUTER JOIN tmp2 b ON a.id = b.fk
GROUP BY a.id
/

SELECT * FROM TABLE(dbms_xplan.display)
/
--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |     7 |    42 |     4 |
|   1 |  SORT GROUP BY NOSORT|             |     7 |    42 |     4 |
|   2 |   MERGE JOIN OUTER   |             |   100 |   600 |     4 |
|   3 |    INDEX FULL SCAN   | TMP1_PK     |    10 |    30 |     1 |
|*  4 |    SORT JOIN         |             |   100 |   300 |     3 |
|   5 |     TABLE ACCESS FULL| TMP2        |   100 |   300 |     1 |
--------------------------------------------------------------------


=> The cardinality of tmp1 is estimated to 10 (which is the num_distinct value for the column ID in dba_tab_cols
=> The estimated cardinality of the whole query is 7, which is the number of distinct values of fk in tmp2
This result would have been accurate if we had performed an INNER JOIN, since only the matching entries would have been outputted...

I ran the 10053 event trace to analyze the CBO's computation... and as usual, i could not understand much of it !
Though, i noticed the following statement (that appears several times) in the Join sections :
"Grouping column cardinality [ ID] 7"

Is it a CBO "bug" ?
Anyway, i can understand that the cardinality of outer joins is not quite a simple matter...

At this step, i have to tell why and how i got into this :
In a rather big query, i had to count this kind of master / detail relation entries, and reuse the result several times.
That's why i factored the subquery using the WITH clause, in order to let the CBO materialize the result (and as a consequence compute it only once).
But as i displayed the EXPLAIN PLAN, the cardinalities computed for the materialized temporary table was not the same as the one computed for the query itself !

Fix little EXPLAIN PLAN on WITH clause bug :
alter session set events='22829 trace name context forever'

Check the previous predicate ! (I've used the undocumented /*+materialize*/ hint to force materialization rather than joining the result with itself for example...)
explain plan for
with t as (
select /*+materialize*/a.id, count(
b.fk) as cnt
from tmp1 a
  left outer join tmp2 b on a.id = b.fk
GROUP BY a.id
)
select * from t
/

----------------------------------------------------------------------------------
| Id  | Operation                  |  Name                        | Rows  | Bytes
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                              |    10 |   260
|   2 |  TEMP TABLE TRANSFORMATION |                              |       |      
|   1 |   RECURSIVE EXECUTION      | SYS_LE_2_0                   |       |      
|   0 |    INSERT STATEMENT        |                              |     7 |    42
|   1 |     LOAD AS SELECT         |                              |       |      
|   2 |      SORT GROUP BY NOSORT  |                              |     7 |    42
|   3 |       MERGE JOIN OUTER     |                              |   100 |   600
|   4 |        INDEX FULL SCAN     | TMP1_PK                      |    10 |    30
|*  5 |        SORT JOIN           |                              |   100 |   300
|   6 |         TABLE ACCESS FULL  | TMP2                         |   100 |   300
|   3 |   VIEW                     |                              |    10 |   260
|   4 |    TABLE ACCESS FULL       | SYS_TEMP_0FD9D6644_24ADF79B  |    10 |   160
----------------------------------------------------------------------------------


=> cardinality for the temp table transformation : 7
=> cardinality of SYS_TEMP_0FD9D6644_24ADF79B  : 10

Once more (after flushing the shared pool), i generated the 10053 trace... and got nothing else headaches !
... and as usual i'll try to interprete it anyway :
for some obscure reason, the CBO seems to alias the TMP2 selection with from$subquery$_005.
After evaluation the NL join outer with the master table TMP1, it aliases the result as from$subquery$_003, and considers it as not analyzed.
And eventually outputs :
Grouping column cardinality [ID] 10 !

Ok, i admit it : i do not have the slightest clue of how i should read this d**n 10053...

(If you know anything about what i described, please leave a comment ;))
Partager cet article
Repost0

commentaires