Recently i read the below link and came to know that,while creating histogram on a VARCHAR2 column, Oracle will only consider the first 32 characters in the column. In other words, if the first 32 bytes of the column values are identical all of the values end up in a single bucket of the histogram even though in reality the values are different and are suppose to be in different histogram buckets.
http://blogs.oracle.com/optimizer/entry/how_do_i_drop_an_existing_histogram_on_a_column_and_stop_the_auto_stats_gathering_job_from_creating
1. Create table with column VARCHAR2(35) and insert 35 characters with first 32 characters identical.
SQL> create table hist_test_var (c1 VARCHAR2(35)); Table created. SQL> SQL> insert into hist_test_var select 'abcdefghijklmnopqrstuvwzyzabcdefghi' from dual connect by level SQL> insert into hist_test_var select 'abcdefghijklmnopqrstuvwzyzabcdefabc' from dual connect by level SQL> insert into hist_test_var select 'abcdefghijklmnopqrstuvwzyzabcdefdef' from dual connect by level SQL> insert into hist_test_var select 'sdhskjhdwehdwehdhwdnskdnkwsndkwdhwj' from dual connect by level SQL> insert into hist_test_var select 'agjhgsdsjgduywdhsdlksdjlksjdwdjliwj' from dual connect by level SQL> commit; Commit complete.
For the first 3 insert statements the first 32 characters are the same “abcdefghijklmnopqrstuvwzyzabcdef”.Rest 2 are different.
Gather the stats
SQL> exec dbms_stats.gather_table_stats (user,'hist_test_var') PL/SQL procedure successfully completed. SQL> SQL> col low_val for a8 SQL> col high_val for a8 SQL> col data_type for a9 SQL> col column_name for a11 SQL> SQL> select 2 a.column_name, 3 display_raw(a.low_value,b.data_type) as low_val, 4 display_raw(a.high_value,b.data_type) as high_val, 5 b.data_type, 6 a.density, 7 a.histogram, 8 a.num_buckets 9 from 10 user_tab_col_statistics a, user_tab_cols b 11 where 12 a.table_name='HIST_TEST_VAR' and 13 a.table_name=b.table_name and 14 a.column_name=b.column_name 15 / COLUMN_NAME LOW_VAL HIGH_VAL DATA_TYPE DENSITY HISTOGRAM NUM_BUCKETS ----------- -------- -------- --------- ---------- --------------- ----------- C1 abcdefgh sdhskjhd VARCHAR2 .2 NONE 1 ijklmnop wehdwehd qrstuvwz hwdnskdn yzabcdef kwsndkwd
The above shows lack of histogram on C1. Ran few select statements on the table and gathered the stats back
SQL> exec dbms_stats.gather_table_stats(user,'hist_test_var'); PL/SQL procedure successfully completed. SQL> select 2 a.column_name, 3 display_raw(a.low_value,b.data_type) as low_val, 4 display_raw(a.high_value,b.data_type) as high_val, 5 b.data_type, 6 a.density, 7 a.histogram, 8 a.num_buckets 9 from 10 user_tab_col_statistics a, user_tab_cols b 11 where 12 a.table_name='HIST_TEST_VAR' and 13 a.table_name=b.table_name and 14 a.column_name=b.column_name 15 / COLUMN_NAME LOW_VAL HIGH_VAL DATA_TYPE DENSITY HISTOGRAM NUM_BUCKETS ----------- -------- -------- --------- ---------- --------------- ----------- C1 abcdefgh sdhskjhd VARCHAR2 .000010092 FREQUENCY 3 ijklmnop wehdwehd qrstuvwz hwdnskdn yzabcdef kwsndkwd
The above shows NUM_BUCKET as 3, which means while creating the histogram oracle considered “abcdefghijklmnopqrstuvwzyzabcdefghi” , “abcdefghijklmnopqrstuvwzyzabcdefabc” and “abcdefghijklmnopqrstuvwzyzabcdefdef” as same and end up in a single bucket of the histogram even though in reality the values are different and are suppose to be in different histogram buckets.This is because Oracle considers only the first 32 characters in the column.
Lets test with exact 32 characters, keeping 31 characters same.
SQL> create table hist_test_var_31 (c1 VARCHAR2(32)); Table created. SQL> SQL> insert into hist_test_var_31 select 'abcdefghijklmnopqrstuvwzyzabcdeF' from dual connect by level SQL> insert into hist_test_var_31 select 'abcdefghijklmnopqrstuvwzyzabcdeG' from dual connect by level SQL> insert into hist_test_var_31 select 'abcdefghijklmnopqrstuvwzyzabcdeH' from dual connect by level SQL> insert into hist_test_var_31 select 'sdhskjhdwehdwehdhwdnskdnkwsndkwd' from dual connect by level SQL> insert into hist_test_var_31 select 'agjhgsdsjgduywdhsdlksdjlksjdwdjl' from dual connect by level SQL> commit; Commit complete. SQL> SQL> exec dbms_stats.gather_table_stats (user,'hist_test_var_31') PL/SQL procedure successfully completed. SQL> SQL> SQL> col low_val for a8 SQL> col high_val for a8 SQL> col data_type for a9 SQL> col column_name for a11 SQL> SQL> select 2 a.column_name, 3 display_raw(a.low_value,b.data_type) as low_val, 4 display_raw(a.high_value,b.data_type) as high_val, 5 b.data_type, 6 a.density, 7 a.histogram, 8 a.num_buckets 9 from 10 user_tab_col_statistics a, user_tab_cols b 11 where 12 a.table_name='HIST_TEST_VAR_31' and 13 a.table_name=b.table_name and 14 a.column_name=b.column_name 15 / COLUMN_NAME LOW_VAL HIGH_VAL DATA_TYPE DENSITY HISTOGRAM NUM_BUCKETS ----------- -------- -------- --------- ---------- --------------- ----------- C1 abcdefgh sdhskjhd VARCHAR2 .2 NONE 1 ijklmnop wehdwehd qrstuvwz hwdnskdn yzabcdeG kwsndkwd REM #### RAN FEW SELECT QUERIES #### SQL> exec dbms_stats.gather_table_stats(user,'hist_test_var_31'); PL/SQL procedure successfully completed. SQL> select 2 a.column_name, 3 display_raw(a.low_value,b.data_type) as low_val, 4 display_raw(a.high_value,b.data_type) as high_val, 5 b.data_type, 6 a.density, 7 a.histogram, 8 a.num_buckets 9 from 10 user_tab_col_statistics a, user_tab_cols b 11 where 12 a.table_name='HIST_TEST_VAR_31' and 13 a.table_name=b.table_name and 14 a.column_name=b.column_name 15 / COLUMN_NAME LOW_VAL HIGH_VAL DATA_TYPE DENSITY HISTOGRAM NUM_BUCKETS ----------- -------- -------- --------- ---------- --------------- ----------- C1 abcdefgh sdhskjhd VARCHAR2 .000010121 FREQUENCY 5 ijklmnop wehdwehd qrstuvwz hwdnskdn yzabcdeG kwsndkwd SQL>
Above shows NUM_BUCKETS 5. The 32nd character of all the distinct values were different, hence making the value different and putting it in a separate bucket.
Does the same happen for column with NUMBER data type.
SQL> create table hist_test_num (c1 NUMBER(35)); Table created. SQL> SQL> SQL> insert into hist_test_num select '01234567891011121314151617181910212' from dual connect by level SQL> insert into hist_test_num select '01234567891011121314151617181910213' from dual connect by level SQL> insert into hist_test_num select '01234567891011121314151617181910214' from dual connect by level SQL> insert into hist_test_num select '11111111111111111111111111111111214' from dual connect by level SQL> insert into hist_test_num select '22222222222222222222222222222222012' from dual connect by level SQL> commit; Commit complete. SQL> SQL> SQL> exec dbms_stats.gather_table_stats (user,'hist_test_num') PL/SQL procedure successfully completed. SQL> SQL> SQL> col low_val for a8 SQL> col high_val for a8 SQL> col data_type for a9 SQL> col column_name for a11 SQL> SQL> select 2 a.column_name, 3 display_raw(a.low_value,b.data_type) as low_val, 4 display_raw(a.high_value,b.data_type) as high_val, 5 b.data_type, 6 a.density, 7 a.histogram, 8 a.num_buckets 9 from 10 user_tab_col_statistics a, user_tab_cols b 11 where 12 a.table_name='HIST_TEST_NUM' and 13 a.table_name=b.table_name and 14 a.column_name=b.column_name 15 / COLUMN_NAME LOW_VAL HIGH_VAL DATA_TYPE DENSITY HISTOGRAM NUM_BUCKETS ----------- -------- -------- --------- ---------- --------------- ----------- C1 12345678 22222222 NUMBER .2 NONE 1 91011121 22222222 31415161 22222222 71819102 22222222 12 012 REM #### RAN FEW SELECT STATEMENTS #### SQL> exec dbms_stats.gather_table_stats(user,'hist_test_num'); PL/SQL procedure successfully completed. SQL> select 2 a.column_name, 3 display_raw(a.low_value,b.data_type) as low_val, 4 display_raw(a.high_value,b.data_type) as high_val, 5 b.data_type, 6 a.density, 7 a.histogram, 8 a.num_buckets 9 from 10 user_tab_col_statistics a, user_tab_cols b 11 where 12 a.table_name='HIST_TEST_NUM' and 13 a.table_name=b.table_name and 14 a.column_name=b.column_name 15 / COLUMN_NAME LOW_VAL HIGH_VAL DATA_TYPE DENSITY HISTOGRAM NUM_BUCKETS ----------- -------- -------- --------- ---------- --------------- ----------- C1 12345678 22222222 NUMBER 9.9674E-06 FREQUENCY 5 91011121 22222222 31415161 22222222 71819102 22222222 12 012
References –
http://structureddata.org/2011/06/08/implicit-datatype-conversion-histograms-bad-execution-plan/
http://blogs.oracle.com/optimizer/entry/how_do_i_drop_an_existing_histogram_on_a_column_and_stop_the_auto_stats_gathering_job_from_creating