Monitoring the Queue
onstat -g rqm sendq
Well - made it safely back home from the IOD conference in Las Vegas
without losing too much money in the casino. The really cool
thing was that while waiting at the airport, I played some slot
machines and won $45.00. Guess I should have taken a later
so I could have played longer... ;-)
I mentioned last week
would be posting some pictures from the conference.
my pictures were not very good, so I would suggest checking out
instead. The conference was great.
were arround 8000 folks attending and IDS had a strong presence.
Well - back to business. In a previous entry I gave some
thoughts about sizing the queue.
In this entry I'm going to describe how to monitor the
queue.There are two main ways to monitor the queues, through onstat and
through the sysmaster database. In this blog entry we will
on the onstat
-g rqm command.
The Reliable Queue Manager (rqm) is the subcomponent of ER which is
responsible for the physical management of the queue. It is
responsible for things such as determining when an item can be removed
from the queue, what thread is referencing a queue item, cursors on the
queue (rqm handles), when an item must be spooled to disk
There are several options which can be used with the onstat -g rqm
command. The following table describes these options:
bordercolorlight="blue" border="1" bordercolor="black"
(Options to onstat -g
(i.e onstat -g rqm)
about all queues
about the send
queue. The send queue is used to transmit transactions to
servers. These transactions might be originating on the local
node or might originated on a remote server in the case with
receive queue. The receive queue is used hold the replicated
transaction as it is received on the target but has not yet been
applied on the target table by the datasync threads.
control queue. This queue is used to manage control messages
as replicate definitions, server definitions, start replicates, etc.
Items placed in the control queues are always copied into
about the ACK
queue. This queue is used to hold acknowledgments before
are sent to the source node.
about the sync
queue. This queue is only used as part of the define server
then only to transmit the syscdr database to the newly defined node.
about the sbspaces used to contain the stable storage of the queues.
transaction headers for each of the transactions within the queue which
are currently in memory.
addition to the
transaction headers, display information about each of the rows
for transactions in the queues.
summary of what is contained in the queue.
In this posting, we are going to limit ourselves to onstat -g rqm SENDQ.
-g rqm sendq output
There are several sections in the onstat -g rqm sendq command.
The following table describes these sections.
- The Summary
This section contains a summary of the queue. It is further
broken into two sections.
- The current summary
This section contains the current statistics about the queue.
- The historical summary
This section contains the historical totals of the queue. It contains
information such as the total number of transactions which have been
queued as well as the maximum size that the queue has grown to
- The Progress
This section contains the 'progress' of the queue. By that we
mean that this is what is tracking what has been sent to what remote
server, and what has been ACKed from the remote server. This
section is further broken into two sections
- The progress table summary
This contains describes what table on disk is used to contain the
progress table. It also describes how often the progress
table is flushed to disk.
- The target/replicate progress information
This contains information on which transactions have been sent to the
target nodes and what the target nodes have acknowledged.
Also this contains the number of bytes per target/replicate
combination which are currently in the queue.
This section contains information about the first and last transactions
which are in memory in the queue.
- The Handle
This section contains a list of each of the handles which has been
allocated to each of the users of the queue. The handle can
thought of as a cursor into the queue. It is used to track
position within the queue.
Current Statistics Section
The current statistics section is the first
section in the onstat -g
rqm sendq command.
It contains information about the current contents of the
such as how many bytes are contained in the queue, how many
transactions are in the queue, how many transactions are currently in
memory, how many have been spooled to disk, how many exist
only on disk, etc.
When a new transaction is placed into the queue, the transaction is
given a stamp. This stamp is used to maintain the order of
transactions within the queue. This is a bit different from
commit order because the original commit order is only useful within
the context of the server on which the transaction is originally
committed. In the case of a system using
routing, it is possible that the send queue will have transactions
which originated on other servers. That would be the case of
replicated transaction which must be forwarded to another node.
In order to maintain the insert order, when a transaction is
inserted into the send queue, it receives a stamp. The stamp
64 bit integer which is maintained as part of the queue. In
example, the next transaction to be inserted will be 638.
In this example, the send queue currently contains 611
transactions of which 268 are in memory, 343 are not in memory at all,
and 42 (611-569) are only in memory. The reason
of the spooled transactions are also in memory is that we spawn a group
of spooling threads when we sense that we are getting close to running
out of memory. The spooled transaction is not immediately
from memory, however. Instead the spooled transaction will be
removed from memory only when the memory limits are reached.
reason for this pre-spooling is to avoid having to do a lot of work
when we reach the memory limits. Once a
been spooled and the in-memory copy of the transaction has been
removed, then the transaction is never completely reloaded back into
memory. Instead we transmit the transaction directly from the
spooled disk copy of the transaction.
The Size of Data in queue
the size of the queue when combining the in-memory transactions with
the spool-only transactions. The Pending Txn Buffers contains
information about transactions which are in the process of being queued
into the qeue.
Historical Statistics Section
Starting with style="font-weight: bold;">Max Real memory data used,
we enter the historical section. This section contains a
summary of what has been placed in the queue in the past.
The Max Real memory data used contains the largest in memory size of
the queue. In this case, it reached up to 1,544,060 bytes.
The configured limits of the queue is currently configured to
1,536,000 bytes, so when the transaction when into the queue which
caused the limit to be reached, it triggered activity to flush the
in-memory transactions which had already been spooled. If no
in-memory transactions had been spooled, then the thread placing
transactions into the queue would have had to also spool the
transactions. That's why we spawn seperate spooling threads
perform the actual spooling. We try to get the spooling done
before we actually have to remove the transaction from memory.
There have been 638 transactions which have been queued to this queue.
That should match up with the insert stamp of the queue.
those 638 transactions, 569 have also been spooled. At this
none of the spooled transactions have been restored. The
for that is that the only reason that the transactions were spooled is
that I brought down one of the targets. Since the target is
then we will not be attempting to restore those transactions.
When that server is brought back up, then we would attempt to
restore those transactions and send them to the target.
Recovered transactions are the transactions which existed only in the
spool when the instanace was started. They are not recovered
by re-reading from the logical log, but are simply recovered from the
disk storage when the engine is started. They would have been
snooped from the logical log at some time in the past, but now are
found in the stable queue.
Total Txns deleted
is the number of transactions that have been removed from the queue.
They may have been only in memory, only on disk in the stable
queue, or in both. The Total
Txns duplicated contains the number of times that we
attempted to queue a transaction which had already been processed.
This can occur when ER is first starting up as part of the
instance startup, or as part of a cdr
start command. The style="font-weight: bold;">Total Txn Lookups is
simply a counter of the number of times that an ER thread attempted to
read a transaction.
Progress Tables Section
The progress table section contains information
on what is currently queued, to which server it is queued for, and what
has been ACKed from each of the participants of the replicate.
The first part of the progress table section is a summary.
The information in the receive queue progress table is
written to disk as part of each transaction that the datasync thread
applies. This is not, however the case with the send queue
progress table. Instead the send queue progress table is
copied to disk every so often. In this example we see that
the progress table is flushed to the table spttrg_send every 30
seconds. Another thing which might trigger the flushing of
the progress table is if over 1000 entries are dirtied.
Below the summary section is a list of the servers and group entries
which contain the information as far as what is currently queued for
each server, what has been sent to the remote server, and what has been
ACKed from the remote server. The term Group is a carry-over
from the 7.31 days when the replicate could be part of a replicate
group. It should really be "Replicate" in post-7.31
instances. The contents of the ACKed and Sent columns
contains the key of the last transaction which was
acknowledged from the remote server or sent to that server.
The KEY is a multi-part number consisting of
number>. From this we can see that the last
transaction which we sent to server 3 was transaction 0x2f/0x1934c8 and
the last transaction which has acknowledged is 0x28/0x684c8.
By examining the progress table we can discover which server is tending
to lag behind. In this example, server 2 is completely
current, but server 3 is lagging somewhat behind.
At the very bottom of this example, we see the start of the transaction
section. This contains the first and last transaction in the
queue which is currently in memory.
RQM Handle Section
The last section contains the handles.
The RQM handle can be thought of as being much like a cursor.
It contains the position within the queue that any thread is
Each thread that attempts to read a transaction from the queue, or to
place a transaction into the queue must first allocate a handle.
This handle is used to maintain the positioning within the
queue. By examining the RQM handle section, you can get an
idea what each of the threads are doing. For instance in this
example, we see that CDRNsA2 (Send Thread to server 2) is at the end of
the queue. We also see that CDRNsT3 (Send Thread to server 3)
is in the process of sending transaction 1/42/0xbc4c8.
It might be a bit surprising to see which threads have handles on the
send queue. The network send threads make sense.
These would be the CDRNsxxx threads.
However, it is a bit surprising to see that the receive
threads (CDRNrxxx) have handles on the send queue. The reason
for this is because of routing. When a transaction is
received which must be forwarded to another server, then the receive
thread will need to place that transaction into the send queue.
Therefore, it is not unusual to see that the receive threads
will have a handle on the send queue.
The other handles make sense. The grouper evaluator
(CDRGeval##) has to have a handle on the send queue because it is
placing transactions originating on this node into the send queue for
transmission to a remote server. The ACK threads (CDRACK##)
would have a handle on the send queue because it must update the
progress table and potentially delete a transaction when an ACK is
received from a remote server.