DDRBLOCK

mpruet's picture

DDRBLOCK

DDRBLOCK


It sometimes happens that quite useful fixes and enhancements make it into a release but remain little-known. A few such fixes and enhancements made it into the 11.10xC2 server; together, these enhancements make the management of the CDR_QDATA_SBSPACE configuration and of DDRBLOCK mode much easier and more tenable than in the past.

The IDS server writes to logical log files in a circular fashion, overwriting older log files when a new log file needs to be written to and more than LOGILES files (as specified in the $INFORMIXDIR/etc/$ONCONFIG configuration file) have been written to. DDRBLOCK occurs when new transactions writing to the log come dangerously close to wrapping the log space around and overwriting old logs that Enterprise Replication has yet to process. In older servers, if the system ever entered DDRBLOCK mode, it could be very difficult to get the system out of DDRBLOCK mode without restarting oninit.

More recent releases of Enterprise Replication -- certainly, version 10 and later -- should rarely enter DDRBLOCK mode, unless the system is severely misconfigured. An example of a dangerously misconfigured system would be one with too few log files, especially if some of the log files are quite large while others are quite small. With such a configuration, even a small hiccup when Enterprise Replication processes log entries can cause DDRBLOCK mode, or even worse, log wrap. If log wrap occurs, that is, if new transactions overwrite entries that Enterprise Replication has yet to process, Enterprise Replication shuts down and data becomes unsynchronized among servers in the replication system.

One condition in which Enterprise Replication can still enter
DDRBLOCK mode even in an otherwise well-configured system is when a
destination site remains inaccessible for an extended period of time.
If this happens, the Reliable Queue Manager (RQM) send queue will save
transactions that include that site in its destination list in stable
storage. If the spool space fills, the oninit server will likely enter
DDRBLOCK mode, because Enterprise Replication cannot stably store
transactions in its send queue and therefore can no longer advance the replay position, the oldest point in the logs that Enterprise Replication needs to access.

As an example, I have configured a small two-server replication system. I configured the IDS instance at which I will be generating transactions with too few logs and too little send queue stable storage and used the 'cdr suspend serv' command to suspend the other server. Since transactions cannot flow to the destination server, transactions quickly start to accumulate in the send queue:

[pinch-cdrtempmurre] (pinch)  110 %  onstat -g rqm sendq | egrep '^ Txns'
 Txns in queue:             18
 Txns in memory:            7
 Txns in spool only:        11
 Txns spooled:              11

and as I configured very little send queue spool space, the spool space immediately fills up, as shown in the message log:

10:44:47  CDR QUEUER: Send Queue space is FULL - waiting for space in CDR_QDATA_SBSPACE

In this case, Enterprise Replication will also raise an alarm of severity 4 and class 31.

Since Enterprise Replication cannot advance the replay position, the IDS instance also enters DDRBLOCK state, as shown by the "Blocked:DDR" line in the following output:

[pinch-cdrtempmurre] (pinch)  129 % onstat -g ddr | head -10

IBM Informix Dynamic Server Version 11.10.F       -- On-Line -- Up 00:26:03 -- 78772 Kbytes
Blocked:DDR 

DDR -- Running --  

# Event  Snoopy   Snoopy   Replay   Replay   Current  Current 
Buffers   ID      Position  ID      Position   ID     Position
2064      4       1ee4454   3       74f018   12       2ad000 

We can see that the replay log id is 3, whereas the current log id to which IDS is writing transactions is 12. The fact that log 12 is the current log is also displayed by the onstat -l command:

[pinch-cdrtempmurre] (pinch)  132 % onstat -l | grep C | grep -v CDR
451f2c30         2        U---C-L  12       1:31763              9000      685     7.61

I configured my example instance to have only 10 logical log files, so if we cannot reuse logical log 3 and are already at log 12, we need 12 - 3 + 1 or all 10 logical log files. Small wonder the server is in DDRBLOCK mode!

The send queue stable storage area is configured via the CDR_QDATA_SBSPACE configuration parameter. 11.10xC2 and later include an addition to onstat that allows the sbspaces configured to CDR_QDATA_SBSPACE to be monitored very easily. The command is onstat -g rqm sbspaces:

onstat -g rqm sbspaces

IBM Informix Dynamic Server Version 11.10.F       -- On-Line -- Up 00:29:41 -- 78772 Kbytes
Blocked:DDR 


RQM Space Statistics for CDR_QDATA_SBSPACE:
-------------------------------------------
name/addr      number    used        free        total       %full   pathname
0x46581c58     5         311         1           312         100     /tmp/amsterdam_sbsp_base
amsterdam_sbsp_base5     311         1           312         100     

0x46e54528     6         295         17          312         95      /tmp/amsterdam_sbsp_2
amsterdam_sbsp_26        295         17          312         95      

0x46e54cf8     7         310         2           312         99      /tmp/amsterdam_sbsp_3
amsterdam_sbsp_37        310         2           312         99      

0x47bceca8     8         312         0           312         100     /tmp/amsterdam_sbsp_4
amsterdam_sbsp_48        312         0           312         100     

In the past, the information returned via the onstat -g rqm sbspaces command was available, but you had gather it by looking at the the CDR_QDATA_SBSPACE values and then manually extracting the information relevant to the CDR_QDATA_SBSPACE spaces from the onstat -d output. Imagine doing this in a "real" system with dozens of dbspaces!

If CDR_QDATA_SBSPACE space starts to run low, you can either add more chunks to an sbspace already in the CDR_QDATA_SBSPACE list, or, starting with the 11.10xC2 release, you can add a new sbspace to the CDR_QDATA_SBSPACE list.

For example, say I have created (via onspaces) a new sbspace mynewcdrsbsp:

[pinch-cdrtempmurre] (configparam)  157 % onstat -d | grep mynewcdrsbsp
47bce508         12       0x68001    12       1        2048     N SB     informix mynewcdrsbsp
47bce6a0         12     12     0          1000       702        702        POSB  /tmp/mynewcdrsbsp

I can then add that space to the list of CDR_QDATA_SBSPACE spaces via the cdr add config command.

[pinch-cdrtempmurre] (configparam)  158 % userid informix cdr add config "CDR_QDATA_SBSPACE mynewcdrsbsp"
 WARNING: The value specifed updated in-memory only.

I can easily verify what sbspaces are configured via onstat. As you can
see, mynewcdrsbsp is there:

[pinch-cdrtempmurre] (configparam)  159 % onstat -g cdr config CDR_QDATA_SBSPACE 

IBM Informix Dynamic Server Version 11.10.F       -- On-Line -- Up 00:39:38 -- 86964 Kbytes
Blocked:DDR 
CDR_QDATA_SBSPACE configuration setting:
              amsterdam_sbsp_base
                 amsterdam_sbsp_2
                 amsterdam_sbsp_3
                 amsterdam_sbsp_4
                     mynewcdrsbsp

and Enterprise Replication is spooling transactions to the new sbspace. In fact, it's already 99% full.

[pinch-cdrtempmurre] (configparam)  162 % onstat -g rqm sbspaces

IBM Informix Dynamic Server Version 11.10.F       -- On-Line -- Up 00:51:59 -- 86964 Kbytes
Blocked:DDR 


RQM Space Statistics for CDR_QDATA_SBSPACE:
-------------------------------------------
name/addr      number    used        free        total       %full   pathname
0x46581c58     5         311         1           312         100     /tmp/amsterdam_sbsp_base
amsterdam_sbsp_base5     311         1           312         100     

0x46e54528     6         312         0           312         100     /tmp/amsterdam_sbsp_2
amsterdam_sbsp_26        312         0           312         100     

0x46e54cf8     7         310         2           312         99      /tmp/amsterdam_sbsp_3
amsterdam_sbsp_37        310         2           312         99      

0x47bceca8     8         312         0           312         100     /tmp/amsterdam_sbsp_4
amsterdam_sbsp_48        312         0           312         100     

0x47bce6a0     12        696         6           702         99      /tmp/mynewcdrsbsp   
mynewcdrsbsp   12        696         6           702         99      

So what about DDRBLOCK mode? In practice, by far the likeliest cause for entering DDRBLOCK mode is that a destination server remains unavailable for an extended period of time. (In this example, I have simulated that condition by suspending the destination server.) If you expect the destination server to become available in a reasonable amount of time and you have enough disk space, you can add more space to the CDR_QDATA_SBSPACE parameter as in this example. Because Enterprise Replication raises an alarm of severity 4 and class 31 when it runs out of send queue spool space, you could even write an alarm handler to automate this task.

What if you expect a destination server to become unavailable for an extended period of time, a period longer than you expect can be handled by spooling the send queue to disk? You will have little choice other than to remove the unavailable server from the replication system and to resynchronize data once it becomes available again; but that is the topic of a future blog entry.