Are your statistics (distributions) are really up to date ? - PART I

mholzbauer's picture

or: Some more Secrets of "UPDATE STATISTICS"...
 

Did you sometimes wonder why a "UPDATE STATISTICS" Statement
is so faster in newer Informix version on greater tables ?

Or you wonder why the query is not faster after executed some
"update statistics medium/high" statements ?

Beginning with 11.70 there are 2 new onconfig parameters:

AUTO_STAT_MODE

  and

STATCHANGE

This parameters has following default values in onconfig.std:

AUTO_STAT_MODE 1
STATCHANGE 10


Description from onconfig.std:

# AUTO_STAT_MODE - Enables (1) or disables (0) update statistics
#                  automatic mode. In automatic mode, statistics of
#                  table, fragment or index are rebuilt only if existing
#                  statistics are considered stale. A table, fragment
#                  or index can change by STATCHANGE percentage before
#                  its statistics are regarded as stale.
# STATCHANGE     - In automatic mode, rebuild statistics only for
#                  table, fragment or index changed by STATCHANGE
#                  percentage since last statistics run.

 

Description in SQL Syntax Guide:

Automatic detection of stale statistics

You can enable Informix to automatically detect which table or fragment and index statistics are stale,
and ONLY REFRESH THE STALE STATISTICS when the UPDATE STATISTICS statement is run.
By default, statistics will be refreshed when 10% of the data is stale. You can use the
STATCHANGE property when a table is created oraltered to set the minimum percentage of change that is
required for the data to be considered stale. The database server refreshes statistics only if the data
has changed beyond that threshold since the distribution statistics were last calculated.

 

So, lets try it out - an example:

$ echo "n" | dbaccessdemo -dbspace datadbs -nots

$ dbaccess stores_demo -

Database selected.

--#
--# find out, if we have statistics, distributions on table "customer"
--#

> SELECT tabname, colname, mode FROM systables t, syscolumns c, sysdistrib d
> WHERE t.tabid = c.tabid AND t.tabid = d.tabid AND c.colno = d.colno
> AND d.seqno = 1 AND t.tabname = 'customer';

No rows found.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode);

Statistics updated.

> SELECT tabname, colname, mode, constructed::DATETIME YEAR TO FRACTION(5) AS constructed
> FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid AND t.tabid = d.tabid
> AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constructed  2015-03-23 00:00:00.00000

tabname      customer
colname      lname
mode         H
constructed  2015-03-23 00:00:00.00000

tabname      customer
colname      zipcode
mode         H
constructed  2015-03-23 00:00:00.00000

3 row(s) retrieved.

--#
--# no time in constructed field. Lets look why:
--#

> SELECT colname, coltype FROM syscolumns
> WHERE tabid = (SELECT tabid FROM systables WHERE tabname = 'sysdistrib') AND colname MATCHES 'constr*';

colname  constructed
coltype  7

colname  constr_time
coltype  10

--#
--# NOTE:
--# coltype 7 == DATE
--# coltype 10 == DATETIME
--# In SQL Reference both fields are defined as DATETIME YEAR TO FRACTION(5),
--# but in fact only const_time is DATETIME YEAR TO FRACTION(5)
--# and constructed is DATE !

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 11:22:40.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 11:22:40.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 11:22:40.00000

3 row(s) retrieved.

! date
Mon Mar 23 11:27:20 CET 2015

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode);

Statistics updated.

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 11:22:40.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 11:22:40.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 11:22:40.00000

3 row(s) retrieved.

--#
--# Oops - no change here in constr_time field - but informix says "Statistics updated."
--#

! onstat -c|grep STATCHANGE
#                  or index can change by STATCHANGE percentage before
# STATCHANGE     - In automatic mode, rebuild statistics only for
#                  table, fragment or index changed by STATCHANGE
STATCHANGE 10

--#
--# 12.10 ONLY:
--#
! onstat -g cfg full STATCHANGE

IBM Informix Dynamic Server Version 12.10.FC4W1 -- On-Line -- Up 6 days 18:51:27 -- 2563016 Kbytes

Configuration Parameter Info

id   name                      type   maxlen   units   rsvd  tunable
237  STATCHANGE                INT4   12       %             *

     min/max : 0,100  
     default : 10
     onconfig: 0
     current : 0

     Description:
     Use the STATCHANGE environment option to specify a positive integer
     for a global percentage of a change threshold for the UPDATE
     STATISTICS statement to use when the automatic mode for restricting
     UPDATE STATISTICS operations to stale or missing distributions
     is enabled.

     The value of the STATCHANGE environment option is used when the
     AUTO_STAT_MODE configuration parameter or the AUTO_STAT_MODE
     environment option has enabled the automatic mode for the UPDATE
     STATISTICS statement, so that it selectively refreshes only stale
     data distributions.

     The value that you set for STATCHANGE specifies a change threshold
     to determine whether distribution statistics qualify for an update
     when the UPDATE STATISTICS statement is operating in automatic mode.

! onstat -c|grep AUTO_STAT_MODE
# AUTO_STAT_MODE - Enables (1) or disables (0) update statistics
AUTO_STAT_MODE 1

> SELECT tabname, statlevel, statchange FROM systables WHERE tabname = 'customer';

tabname     customer
statlevel   A
statchange  

1 row(s) retrieved.

--#
--# statlevel 'A' means auto, 'T' means table and 'F' means fragment
--#

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode) FORCE;

Statistics updated.

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 11:31:28.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 11:31:28.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 11:31:28.00000

3 row(s) retrieved.

--#
--# there is a new option to the Update statistics command: FORCE
--# This has worked - we have a changed constr_time field ..
--#

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode);

Statistics updated.

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 11:31:28.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 11:31:28.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 11:31:28.00000

3 row(s) retrieved.

--#
--# nothing is changed when running without FORCE (and no changes on table)
--#

> UNLOAD TO 'statchange.sql' DELIMITER ';'
> SELECT 'ALTER TABLE ' || TRIM(tabname) || ' STATCHANGE 0'
> FROM systables WHERE tabname = 'customer';

1 row(s) unloaded.

Database closed.

! cat statchange.sql
ALTER TABLE customer STATCHANGE 0;

! dbaccess -e stores_demo statchange.sql

Database selected.

ALTER TABLE customer STATCHANGE 0;
Table altered.

Database closed.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode);

Statistics updated.

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 11:31:28.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 11:31:28.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 11:31:28.00000

3 row(s) retrieved.

--#
--# no changes in constr_time after altering tables statchange from 10 to 0
--#

> UNLOAD TO 'statchange.sql' DELIMITER ';'
> SELECT 'ALTER TABLE ' || TRIM(tabname) || ' STATLEVEL TABLE'
> FROM systables WHERE tabname = 'customer';

1 row(s) unloaded.

! dbaccess -e stores_demo statchange.sql

Database selected.

ALTER TABLE customer STATLEVEL TABLE;
Table altered.

Database closed.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode);

Statistics updated.

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 11:31:28.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 11:31:28.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 11:31:28.00000

3 row(s) retrieved.

--#
--# no changes in constr_time after altering tables statlevel from auto to table
--#

> SELECT tabname, statlevel, statchange FROM systables WHERE tabname = 'customer';

tabname     customer
statlevel   T
statchange  0

1 row(s) retrieved.

! onmode -wf STATCHANGE=0
Current value for STATCHANGE (0) was saved in config file.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode);

Statistics updated.

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 11:31:28.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 11:31:28.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 11:31:28.00000

3 row(s) retrieved.

--#
--# as we see here, setting global STATCHANGE TO 0 does not help here..
--#

! onstat -c|grep AUTO_STAT_MODE
# AUTO_STAT_MODE - Enables (1) or disables (0) update statistics
AUTO_STAT_MODE 1

! onmode -wf AUTO_STAT_MODE=0
Value of AUTO_STAT_MODE has been changed to 0.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode);

Statistics updated.

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 11:41:50.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 11:41:50.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 11:41:50.00000

3 row(s) retrieved.

--#
--# but setting global AUTO_STAT_MODE from 1 to 0 helps - statistics are updated.
--#

! onmode -wf AUTO_STAT_MODE=1
Value of AUTO_STAT_MODE has been changed to 1.

> UPDATE STATISTICS HIGH FOR TABLE customer(customer_num,lname,zipcode);

Statistics updated.

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 11:41:50.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 11:41:50.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 11:41:50.00000

3 row(s) retrieved.

--#
--# setting parameter back
--#

> SET ENVIRONMENT AUTO_STAT_MODE 'off';

Environment set.

> UPDATE STATISTICS HIGH;

Statistics updated.

> SELECT tabname, colname, mode, constr_time FROM systables t, syscolumns c, sysdistrib d WHERE t.tabid = c.tabid
> AND t.tabid = d.tabid AND c.colno = d.colno AND d.seqno = 1 AND t.tabtype = 'T' AND t.tabname = 'customer';

tabname      customer
colname      customer_num
mode         H
constr_time  2015-03-23 11:44:54.00000

tabname      customer
colname      fname
mode         H
constr_time  2015-03-23 11:44:54.00000

tabname      customer
colname      lname
mode         H
constr_time  2015-03-23 11:44:54.00000

tabname      customer
colname      company
mode         H
constr_time  2015-03-23 11:44:54.00000

tabname      customer
colname      address1
mode         H
constr_time  2015-03-23 11:44:54.00000

tabname      customer
colname      address2
mode         H
constr_time  2015-03-23 11:44:54.00000

tabname      customer
colname      city
mode         H
constr_time  2015-03-23 11:44:54.00000

tabname      customer
colname      state
mode         H
constr_time  2015-03-23 11:44:54.00000

tabname      customer
colname      zipcode
mode         H
constr_time  2015-03-23 11:44:54.00000

tabname      customer
colname      phone
mode         H
constr_time  2015-03-23 11:44:54.00000

10 row(s) retrieved.

--#
--# set environment auto_stat_mode to 'off' works also..
--#

! onmode -wf STATCHANGE=10
Value of STATCHANGE has been changed to 10 percent.

 

 

More next days....