Are your statistics (distributions) are really up to date ? - PART II

mholzbauer's picture


So what we know in Part I  is that "Statistics updated." does not always mean that your statistics are up to date.
Sometimes it means: "Statistics are up to date and not changed - STATLEVEL threshold is not reached".

Lets look at the explain plan to verify, that your statistics are up to date or not:

$ dbaccess stores_demo -

Database selected.

> UPDATE STATISTICS LOW FOR TABLE customer DROP DISTRIBUTIONS;

Statistics updated.

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

No rows found.

! rm -f sqexplain.out

> SET EXPLAIN STATISTICS;

Explain set.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode) DISTRIBUTIONS ONLY;

Statistics updated.

> SET EXPLAIN OFF;

Explain set.

! cat sqexplain.out;sleep 5

UPDATE STATISTICS:
==================

Table:         informix.customer
Mode:        HIGH
Number of Bins:       29        Bin size        1
Sort data         0.8 MB    Sort memory granted         0.8 MB
Estimated number of table scans 1
PASS #1    zipcode,lname,customer_num
Scan 0 Sort 0 Build 0 Insert 0 Close 0 Total 0
Completed pass 1 in 0 minutes 0 seconds

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 16:34:51.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 16:34:51.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 16:34:51.00000

3 row(s) retrieved.

! rm -f sqexplain.out

> SET EXPLAIN STATISTICS;

Explain set.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode) DISTRIBUTIONS ONLY;

Statistics updated.

> SET EXPLAIN OFF;

Explain set.

! cat sqexplain.out;sleep 5

UPDATE STATISTICS:
==================

Table:         informix.customer
Mode:        HIGH
Number of Bins:       29        Bin size        1

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 16:34:51.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 16:34:51.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 16:34:51.00000

3 row(s) retrieved.

! rm -f sqexplain.out

> SET EXPLAIN STATISTICS;

Explain set.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode) DISTRIBUTIONS ONLY FORCE;

Statistics updated.

> SET EXPLAIN OFF;

Explain set.

! cat sqexplain.out;sleep 5

UPDATE STATISTICS:
==================

Table:         informix.customer
Mode:        HIGH
Number of Bins:       29        Bin size        1
Sort data         0.8 MB    Sort memory granted         0.8 MB
Estimated number of table scans 1
PASS #1    zipcode,lname,customer_num
Scan 0 Sort 0 Build 0 Insert 0 Close 0 Total 0
Completed pass 1 in 0 minutes 0 seconds

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 16:35:01.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 16:35:01.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 16:35:01.00000

3 row(s) retrieved.

The first  "UPDATE STATISTICS HIGH" (USTH) changed the distributions,
because I dropped the distributions with the USTL DROP DISTRIBUTIONS before.

The second USTH changed nothing - the value of STATCHANGE was not reached.
If we see "Sort data ..." in sqexplain.out than the statistics are updated.

The third USTH changed the distributions because it is FORCE'ed.

 

Okay, now lets look at the STATCHANGE value:

$ dbaccess stores_demo -

Database selected.

> SELECT tabname, colname, mode, constr_time
> FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid AND t.tabid = d.tabid
> AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-24 11:58:38.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-24 11:58:38.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-24 11:58:38.00000

3 row(s) retrieved.

> SELECT COUNT(*) FROM customer;

      (count(*))

              28

1 row(s) retrieved.

> SELECT FIRST 1 * FROM customer;

customer_num  101
fname         Ludwig
lname         Pauli
company       All Sports Supplies
address1      213 Erstwild Court
address2      
city          Sunnyvale
state         CA
zipcode       94086
phone         408-789-8075

1 row(s) retrieved.

> SELECT customer_num FROM customer;

customer_num

         101
         102
         103
         104
         105
         106
         107
         108
         109
         110
         111
         112
         113
         114
         115
         116
         117
         118
         119
         120
         121
         122
         123
         124
         125
         126
         127
         128

28 row(s) retrieved.

> SELECT * FROM systables WHERE tabname = 'customer';

tabname          customer
owner            informix
partnum          2097445
tabid            100
rowsize          134
ncols            10
nindexes         2
nrows            28.00000000000
created          03/24/2015
version          6750215
tabtype          T
locklevel        P
npused           2.000000000000
fextsize         16
nextsize         16
flags            0
site             
dbname           
type_xid         0
am_id            0
pagesize         2048
ustlowts         2015-03-24 11:58:38.00000
secpolicyid      0
protgranularity  
statchange       
statlevel        A

1 row(s) retrieved.

> SELECT FIRST 1 * FROM sysdistrib WHERE tabid = 100;

tabid             100
colno             1
seqno             1
constructed       03/24/2015
mode              H
resolution        0.5
confidence        0.00
encdat            ABwAAAAAAAAAJUkSPUmSZEAAAIA_AAAAZSVJEj0AAABlJUkSPQAAAGYlSRI9A
                  AAAZyVJEj0AAABoJUkSPQAAAGklSRI9AAAAaiVJEj0AAABrJUkSPQAAAGwlSR
                  I9AAAAbSVJEj0AAABuJUkSPQAAAG8lSRI9AAAAcCVJEj0AAABxJUkSPQAAAHI
                  lSRI9AAAAcyVJEj0AAAB0JUkSPQAAAHUlSRI9AAAAdiVJEj0AAAB3JUkSPQAA
                  AHglSRI9AAAA
type              A
smplsize          0.00
rowssmpld         28.00000000000
constr_time       2015-03-24 11:58:38.00000
ustnrows          28.00000000000
ustbuildduration    0:00:00.00009
nupdates          0.00
ndeletes          0.00
ninserts          28.00000000000

1 row(s) retrieved.

> SELECT c.colname[1,12], d.constr_time, d.mode, d.nupdates::INT nup, d.ninserts::INT nins
> FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid AND t.tabid = d.tabid
> AND c.colno = d.colno AND d.seqno = 1 AND t.tabname = 'customer';

colname      constr_time               mode         nup        nins

customer_num 2015-03-24 11:58:38.00000 H              0          28
lname        2015-03-24 11:58:38.00000 H              0          28
zipcode      2015-03-24 11:58:38.00000 H              0          28

3 row(s) retrieved.

! onstat -g cfg STATCHANGE
IBM Informix Dynamic Server Version 12.10.FC4W1 -- On-Line -- Up 7 days 19:20:02 -- 2563016 Kbytes

name                      current value
STATCHANGE                10

! onstat -g cfg AUTO_STAT_MODE
IBM Informix Dynamic Server Version 12.10.FC4W1 -- On-Line -- Up 7 days 19:20:02 -- 2563016 Kbytes

name                      current value
AUTO_STAT_MODE            1

> SELECT c.colname[1,12], d.constr_time, d.mode, d.nupdates::INT nup, d.ninserts::INT nins
> FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid AND t.tabid = d.tabid
> AND c.colno = d.colno AND d.seqno = 1 AND t.tabname = 'customer';

colname      constr_time               mode         nup        nins

customer_num 2015-03-24 11:58:38.00000 H              0          28
lname        2015-03-24 11:58:38.00000 H              0          28
zipcode      2015-03-24 11:58:38.00000 H              0          28

3 row(s) retrieved.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode) DISTRIBUTIONS ONLY;

Statistics updated.

> SELECT c.colname[1,12], d.constr_time, d.mode, d.nupdates::INT nup, d.ninserts::INT nins
> FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid AND t.tabid = d.tabid
> AND c.colno = d.colno AND d.seqno = 1 AND t.tabname = 'customer';

colname      constr_time               mode         nup        nins

customer_num 2015-03-24 11:58:38.00000 H              0          28
lname        2015-03-24 11:58:38.00000 H              0          28
zipcode      2015-03-24 11:58:38.00000 H              0          28

3 row(s) retrieved.

> UPDATE customer SET lname = lname WHERE customer_num <= 102;

2 row(s) updated.

! sleep 2

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode) DISTRIBUTIONS ONLY;

Statistics updated.

SELECT c.colname[1,12], d.constr_time, d.mode, d.nupdates::INT nup, d.ninserts::INT nins
FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid AND t.tabid = d.tabid
AND c.colno = d.colno AND d.seqno = 1 AND t.tabname = 'customer';

colname      constr_time               mode         nup        nins

customer_num 2015-03-24 11:58:38.00000 H              0          28
lname        2015-03-24 11:58:38.00000 H              0          28
zipcode      2015-03-24 11:58:38.00000 H              0          28

3 row(s) retrieved.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode);

Statistics updated.

> SELECT c.colname[1,12], d.constr_time, d.mode, d.nupdates::INT nup, d.ninserts::INT nins
> FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid AND t.tabid = d.tabid
> AND c.colno = d.colno AND d.seqno = 1 AND t.tabname = 'customer';

colname      constr_time               mode         nup        nins

customer_num 2015-03-24 11:58:38.00000 H              0          28
lname        2015-03-24 11:58:38.00000 H              0          28
zipcode      2015-03-24 11:58:38.00000 H              0          28

3 row(s) retrieved.

> UPDATE customer SET lname = UPPER(lname) WHERE customer_num <= 101;

1 row(s) updated.

! sleep 2

> SELECT c.colname[1,12], d.constr_time, d.mode, d.nupdates::INT nup, d.ninserts::INT nins
> FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid AND t.tabid = d.tabid
> AND c.colno = d.colno AND d.seqno = 1 AND t.tabname = 'customer';

colname      constr_time               mode         nup        nins

customer_num 2015-03-24 11:58:38.00000 H              0          28
lname        2015-03-24 11:58:38.00000 H              0          28
zipcode      2015-03-24 11:58:38.00000 H              0          28

3 row(s) retrieved.

> SELECT t.tabname[1,12], c.colname[1,12], p.nupdates::INT nup, p.ninserts::INT nins
> FROM systables t, syscolumns c, sysmaster:sysptnhdr p, sysdistrib d
> WHERE t.tabid = c.tabid AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1
> AND t.partnum = p.partnum AND t.tabname = 'customer';

tabname      colname              nup        nins

customer     customer_num           3          28
customer     lname                  3          28
customer     zipcode                3          28

3 row(s) retrieved.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode) DISTRIBUTIONS ONLY;

Statistics updated.

> SELECT c.colname[1,12], d.constr_time, d.mode, d.nupdates::INT nup, d.ninserts::INT nins
> FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid AND t.tabid = d.tabid
> AND c.colno = d.colno AND d.seqno = 1 AND t.tabname = 'customer';

colname      constr_time               mode         nup        nins

customer_num 2015-03-24 11:58:38.00000 H              0          28
lname        2015-03-24 11:58:38.00000 H              0          28
zipcode      2015-03-24 11:58:38.00000 H              0          28

3 row(s) retrieved.

> SELECT t.tabname[1,12], c.colname[1,12], p.nupdates::INT nup, p.ninserts::INT nins
> FROM systables t, syscolumns c, sysmaster:sysptnhdr p, sysdistrib d
> WHERE t.tabid = c.tabid AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1
> AND t.partnum = p.partnum AND t.tabname = 'customer';

tabname      colname              nup        nins

customer     customer_num           3          28
customer     lname                  3          28
customer     zipcode                3          28

3 row(s) retrieved.

> UPDATE customer SET lname = LOWER(lname) WHERE customer_num = 102;

1 row(s) updated.

! sleep 2

> SELECT c.colname[1,12], d.constr_time, d.mode, d.nupdates::INT nup, d.ninserts::INT nins
> FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid AND t.tabid = d.tabid
> AND c.colno = d.colno AND d.seqno = 1 AND t.tabname = 'customer';

colname      constr_time               mode         nup        nins

customer_num 2015-03-24 11:58:38.00000 H              0          28
lname        2015-03-24 11:58:38.00000 H              0          28
zipcode      2015-03-24 11:58:38.00000 H              0          28

3 row(s) retrieved.

> SELECT t.tabname[1,12], c.colname[1,12], p.nupdates::INT nup, p.ninserts::INT nins
> FROM systables t, syscolumns c, sysmaster:sysptnhdr p, sysdistrib d
> WHERE t.tabid = c.tabid AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1
> AND t.partnum = p.partnum AND t.tabname = 'customer';

tabname      colname              nup        nins

customer     customer_num           4          28
customer     lname                  4          28
customer     zipcode                4          28

3 row(s) retrieved.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode) DISTRIBUTIONS ONLY;

Statistics updated.

> SELECT c.colname[1,12], d.constr_time, d.mode, d.nupdates::INT nup, d.ninserts::INT nins
> FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid AND t.tabid = d.tabid
> AND c.colno = d.colno AND d.seqno = 1 AND t.tabname = 'customer';

colname      constr_time               mode         nup        nins

customer_num 2015-03-24 11:58:44.00000 H              4          28
lname        2015-03-24 11:58:44.00000 H              4          28
zipcode      2015-03-24 11:58:44.00000 H              4          28

3 row(s) retrieved.

> SELECT t.tabname[1,12], c.colname[1,12], p.nupdates::INT nup, p.ninserts::INT nins
> FROM systables t, syscolumns c, sysmaster:sysptnhdr p, sysdistrib d
> WHERE t.tabid = c.tabid AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1
> AND t.partnum = p.partnum AND t.tabname = 'customer';

tabname      colname              nup        nins

customer     customer_num           4          28
customer     lname                  4          28
customer     zipcode                4          28

3 row(s) retrieved.

Statistics are updated in sysdistrib after a little bit more than 10% of data has changed (via UPDATE).
In fact USTH changed sysdistrib data after 14.28% of data updated (we have 28 rows and 4 updates).
Maybe USTH changes sysdistrib only if more than % STATCHANGE rows are changed (4 rows).
10% of 28 rows == 2.8 rows this is rounded 3 rows. From the manual STATCHANGE is a threshold.
So it seems that threashold means more than STATCHANGE number of rows changed.

From the manuals:

"...the UPDATE STATISTICS statement compares the STATCHANGE setting with the
percentage of rows that have changed in each table or fragment since the current
data distributions were calculated, and selectively updates only the missing or stale
distribution statistics for each table or fragment within the scope of the UPDATE
STATISTICS statement."

"When the UPDATE STATISTICS statement runs
in MEDIUM or HIGH mode against the table, the database server compares the
stored values in these columns with the current values in the partition. Column
distribution statistics for the table are not updated if the sum of the stored values
differs from the sum of these current sysdistrib DML counter values from the
partition page by less than the threshold specified by the setting of the
STATCHANGE table attribute or of the STATCHANGE configuration parameter."

"Including the FORCE keyword emulates the previous UPDATE STATISTICS
behavior of Informix database servers before version 11.70."

SUMMARY

starting version 11.70 you can use:

  UPDATE STATISTICS HIGH/MEDIUM FOR TABLE ... FORCE;

or

  SET ENVIRONMENT AUTO_STAT_MODE 'off';
  UPDATE STATISTICS HIGH/MEDIUM FOR TABLE ...;

to be sure, that your statistics are really up to date when executing UST MEDIUM/HIGH.