Oracle

Histogram on VARCHAR2 column having first 32 character identicals

Recently i read the below link and came to know that,while creating histogram on a VARCHAR2 column, Oracle will only consider the first 32 characters in the column. In other words, if the first 32 bytes of the column values are identical all of the values end up in a single bucket of the histogram even though in reality the values are different and are suppose to be in different histogram buckets.
http://blogs.oracle.com/optimizer/entry/how_do_i_drop_an_existing_histogram_on_a_column_and_stop_the_auto_stats_gathering_job_from_creating
1. Create table with column VARCHAR2(35) and insert 35 characters with first 32 characters identical.

SQL> create table hist_test_var (c1 VARCHAR2(35));
Table created.
SQL>
SQL> insert into hist_test_var select 'abcdefghijklmnopqrstuvwzyzabcdefghi' from dual connect by level
SQL> insert into hist_test_var select 'abcdefghijklmnopqrstuvwzyzabcdefabc' from dual connect by level
SQL> insert into hist_test_var select 'abcdefghijklmnopqrstuvwzyzabcdefdef' from dual connect by level
SQL> insert into hist_test_var select 'sdhskjhdwehdwehdhwdnskdnkwsndkwdhwj' from dual connect by level
SQL> insert into hist_test_var select 'agjhgsdsjgduywdhsdlksdjlksjdwdjliwj' from dual connect by level
SQL> commit;
Commit complete.

For the first 3 insert statements the first 32 characters are the same “abcdefghijklmnopqrstuvwzyzabcdef”.Rest 2 are different.
Gather the stats

SQL> exec dbms_stats.gather_table_stats (user,'hist_test_var')
PL/SQL procedure successfully completed.
SQL>
SQL> col low_val     for a8
SQL> col high_val    for a8
SQL> col data_type   for a9
SQL> col column_name for a11
SQL>
SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='HIST_TEST_VAR' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /
COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS
----------- -------- -------- --------- ---------- --------------- -----------
C1          abcdefgh sdhskjhd VARCHAR2          .2 NONE                      1
            ijklmnop wehdwehd
            qrstuvwz hwdnskdn
            yzabcdef kwsndkwd

The above shows lack of histogram on C1. Ran few select statements on the table and gathered the stats back

SQL> exec dbms_stats.gather_table_stats(user,'hist_test_var');
PL/SQL procedure successfully completed.
SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='HIST_TEST_VAR' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /
COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS
----------- -------- -------- --------- ---------- --------------- -----------
C1          abcdefgh sdhskjhd VARCHAR2  .000010092 FREQUENCY                 3
            ijklmnop wehdwehd
            qrstuvwz hwdnskdn
            yzabcdef kwsndkwd

The above shows NUM_BUCKET as 3, which means while creating the histogram oracle considered “abcdefghijklmnopqrstuvwzyzabcdefghi” , “abcdefghijklmnopqrstuvwzyzabcdefabc” and “abcdefghijklmnopqrstuvwzyzabcdefdef” as same and end up in a single bucket of the histogram even though in reality the values are different and are suppose to be in different histogram buckets.This is because Oracle considers only the first 32 characters in the column.
Lets test with exact 32 characters, keeping 31 characters same.

SQL> create table hist_test_var_31 (c1 VARCHAR2(32));
Table created.
SQL>
SQL> insert into hist_test_var_31 select 'abcdefghijklmnopqrstuvwzyzabcdeF' from dual connect by level
SQL> insert into hist_test_var_31 select 'abcdefghijklmnopqrstuvwzyzabcdeG' from dual connect by level
SQL> insert into hist_test_var_31 select 'abcdefghijklmnopqrstuvwzyzabcdeH' from dual connect by level
SQL> insert into hist_test_var_31 select 'sdhskjhdwehdwehdhwdnskdnkwsndkwd' from dual connect by level
SQL> insert into hist_test_var_31 select 'agjhgsdsjgduywdhsdlksdjlksjdwdjl' from dual connect by level
SQL> commit;
Commit complete.
SQL>
SQL> exec dbms_stats.gather_table_stats (user,'hist_test_var_31')
PL/SQL procedure successfully completed.
SQL>
SQL>
SQL> col low_val     for a8
SQL> col high_val    for a8
SQL> col data_type   for a9
SQL> col column_name for a11
SQL>
SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='HIST_TEST_VAR_31' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /
COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS
----------- -------- -------- --------- ---------- --------------- -----------
C1          abcdefgh sdhskjhd VARCHAR2          .2 NONE                      1
            ijklmnop wehdwehd
            qrstuvwz hwdnskdn
            yzabcdeG kwsndkwd
REM #### RAN FEW SELECT QUERIES ####
SQL> exec dbms_stats.gather_table_stats(user,'hist_test_var_31');
PL/SQL procedure successfully completed.
SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='HIST_TEST_VAR_31' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /
COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS
----------- -------- -------- --------- ---------- --------------- -----------
C1          abcdefgh sdhskjhd VARCHAR2  .000010121 FREQUENCY                 5 
            ijklmnop wehdwehd
            qrstuvwz hwdnskdn
            yzabcdeG kwsndkwd
SQL>

Above shows NUM_BUCKETS 5. The 32nd character of all the distinct values were different, hence making the value different and putting it in a separate bucket.
Does the same happen for column with NUMBER data type.

SQL> create table hist_test_num (c1 NUMBER(35));
Table created.
SQL>
SQL>
SQL> insert into hist_test_num select '01234567891011121314151617181910212' from dual connect by level
SQL> insert into hist_test_num select '01234567891011121314151617181910213' from dual connect by level
SQL> insert into hist_test_num select '01234567891011121314151617181910214' from dual connect by level
SQL> insert into hist_test_num select '11111111111111111111111111111111214' from dual connect by level
SQL> insert into hist_test_num select '22222222222222222222222222222222012' from dual connect by level
SQL> commit;
Commit complete.
SQL>
SQL>
SQL> exec dbms_stats.gather_table_stats (user,'hist_test_num')
PL/SQL procedure successfully completed.
SQL>
SQL>
SQL> col low_val     for a8
SQL> col high_val    for a8
SQL> col data_type   for a9
SQL> col column_name for a11
SQL>
SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='HIST_TEST_NUM' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /
COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS
----------- -------- -------- --------- ---------- --------------- -----------
C1          12345678 22222222 NUMBER            .2 NONE                      1
            91011121 22222222
            31415161 22222222
            71819102 22222222
            12       012
REM #### RAN FEW SELECT STATEMENTS ####
SQL> exec dbms_stats.gather_table_stats(user,'hist_test_num');
PL/SQL procedure successfully completed.
SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='HIST_TEST_NUM' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /
COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS
----------- -------- -------- --------- ---------- --------------- -----------
C1          12345678 22222222 NUMBER    9.9674E-06 FREQUENCY                 5
            91011121 22222222
            31415161 22222222
            71819102 22222222
            12       012
                                                                                                                                                                                                                                      

References –
http://structureddata.org/2011/06/08/implicit-datatype-conversion-histograms-bad-execution-plan/
http://blogs.oracle.com/optimizer/entry/how_do_i_drop_an_existing_histogram_on_a_column_and_stop_the_auto_stats_gathering_job_from_creating

Leave a Reply