Recently i was asked a question, what would be the sample size percentage for indexes when tables stats are gathered with estimate percent of 5 and cascade set to true.My answer to it was 5 within a second.I had this believe that the sample size percentage for an index is same to the estimate_percent given while gathering the stats.So it was time for some test –
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production PL/SQL Release 11.2.0.2.0 - Production CORE 11.2.0.2.0 Production TNS for 32-bit Windows: Version 11.2.0.2.0 - Production NLSRTL Version 11.2.0.2.0 - Production SQL> drop table STAT_TEST purge; Table dropped. SQL> create table stat_test ( 2 c1 number not null, 3 c2 number not null, 4 c3 varchar2(300) not null); Table created. SQL> create index stat_c2_idx on stat_test(c2); Index created. SQL> SQL> create unique index stat_c1_indx on stat_test(c1); Index created. SQL> create index stat_c3_idx on stat_test(c3); Index created. SQL> insert into stat_test 2 select 3 rownum as c1, 4 1 as c2, 5 rpad('A',300,'A') as c3 6 from 7 dual 8 connect by 9 level / Enter value for tblown: ANAND Enter value for tblnm: STAT_TEST SE SORT1 SORT2 S INDEX_NAME LAST_ANALYZED NUM_ROWS SAMPLE_PCT DISTINCT_KEYS CLUSTERING_FACTOR FLAGS -- ------------------------------ ------------------------------ - ------------------------------ ------------- ---------- ---------- ------------- ----------------- ---------- I ANAND STAT_C2_IDX STAT_C2_IDX N I ANAND STAT_C1_INDX STAT_C1_INDX U I ANAND STAT_C3_IDX STAT_C3_IDX N
Gathered stats on the index
SQL> exec dbms_stats.gather_index_stats(ownname =>'ANAND',indname =>'STAT_C1_INDX',estimate_percent=>5,degree => DBMS_STATS.AUTO_DEGREE); PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_index_stats(ownname =>'ANAND',indname =>'STAT_C2_IDX',estimate_percent=>5,degree => DBMS_STATS.AUTO_DEGREE); PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_index_stats(ownname =>'ANAND',indname =>'STAT_C3_IDX',estimate_percent=>5,degree => DBMS_STATS.AUTO_DEGREE); PL/SQL procedure successfully completed.
As i have given estimate_percent i would expect sample_percentage to be 5%. So lets check –
SQL> / Enter value for tblown: ANAND Enter value for tblnm: STAT_TEST SE SORT1 SORT2 S INDEX_NAME LAST_ANALYZED NUM_ROWS SAMPLE_PCT DISTINCT_KEYS CLUSTERING_FACTOR FLAGS -- ------------------------------ ------------------------------ - ------------------------------ ------------- ---------- ---------- ------------- ----------------- ---------- I ANAND STAT_C2_IDX STAT_C2_IDX 20120518 0035 1011697 40 1 46109 N I ANAND STAT_C1_INDX STAT_C1_INDX 20120518 0031 1016203 61 1016203 47680 U I ANAND STAT_C3_IDX STAT_C3_IDX 20120518 0035 998800 5 1 68460 N
Deleted the stats on the index and gathered with dbms_stats.gather_table_stats with estimate
_percentage of 5% to check what happens in such case
SQL> exec dbms_stats.gather_table_stats(ownname =>'ANAND',tabname =>'STAT_TEST',estimate_percent=>5,degree => DBMS_STATS.AUTO_DEGREE); PL/SQL procedure successfully completed. SQL> / Enter value for tblown: ANAND Enter value for tblnm: STAT_TEST SE SORT1 SORT2 S INDEX_NAME LAST_ANALYZED NUM_ROWS SAMPLE_PCT DISTINCT_KEYS CLUSTERING_FACTOR FLAGS -- ------------------------------ ------------------------------ - ------------------------------ ------------- ---------- ---------- ------------- ----------------- ---------- I ANAND STAT_C2_IDX STAT_C2_IDX 20120518 0037 1008329 40 1 45928 N I ANAND STAT_C1_INDX STAT_C1_INDX 20120518 0037 1007530 61 1007530 47296 U I ANAND STAT_C3_IDX STAT_C3_IDX 20120518 0037 1008480 5 1 68880 N SQL> select 'T ' seg_type, 2 t.owner sort1, 3 t.table_name sort2, 4 null sort3, 5 t.table_name, 6 to_char(t.last_analyzed,'yyyymmdd hh24mi') last_analyzed, 7 t.num_rows, 8 -- t.sample_size, 9 decode(t.num_rows, 10 null,to_number(null), 11 0,100, 12 round((t.sample_size*100)/t.num_rows,0)) sample_pct, 13 from dba_tables t 14 where t.owner = upper('&tblown') 15 and t.table_name like upper('&tblnm') 16 / Enter value for tblown: ANAND Enter value for tblnm: STAT_TEST SE SORT1 SORT2 S TABLE_NAME LAST_ANALYZED NUM_ROWS SAMPLE_PCT -- ------------------------------ ------------------------------ - ------------------------- ------------- ---------- ---------- - T ANAND STAT_TEST STAT_TEST 20120518 0037 1001020 5
So the table shows sample_pct as 5% but for indexes is different.
I posted the same on OTN forum and Sir Jonathan Lewis replied —
According to a comment I wrote a couple of years ago ( https://forums.oracle.com/forums/thread.jspa?threadID=1116700 ) Oracle used to have a note on MOS stating that because the data in an index was sorted, and therefore easily able to produce a distorted picture on a small sample, the index sample was adjusted if necessary to aim for a minimum number of leaf blocks which is about 900.
The two indexes where you’ve got large samples very short keys, so the 5% was probably too few blocks. The index where you got a 5% sample was a long key, so maybe the 5% was comfortably over 900.
After which i did few checks and observed that is if an index is having upto ~1100 leaf blocks, then stats with 100% sample is being generated on the index and as the leaf block increase oracle internally determines the estimate percentage for sampling.