Monitoring the Queue

mpruet's picture




content="text/html; charset=ISO-8859-1">
Monitor_the_queue

Monitoring
the Queue



Overview

onstat -g rqm sendq
output


Current
Statistics


Historical
Statistics


Progress Table

RQM Handle





Overview




Well - made it safely back home from the IOD conference in Las Vegas
without losing too much money in the casino.  The really cool
thing was that while waiting at the airport, I played some slot
machines and won $45.00.  Guess I should have taken a later
flight
so I could have played longer... ;-)

I mentioned last week
that I
would be posting some pictures from the conference.
 Unfortunatly,
my pictures were not very good, so I would suggest checking out
these pictures
instead.  The conference was great.
 There
were arround 8000 folks attending and IDS had a strong presence.



Well - back to business.  In a previous entry I gave some
thoughts about sizing the queue.
 In this entry I'm going to describe how to monitor the
queue.There are two main ways to monitor the queues, through onstat and
through the sysmaster database.  In this blog entry we will
focus
on  the onstat
-g rqm
command.



The Reliable Queue Manager (rqm) is the subcomponent of ER which is
responsible for the physical management of the queue.  It is
responsible for things such as determining when an item can be removed
from the queue, what thread is referencing a queue item, cursors on the
queue (rqm handles), when an item must be spooled to disk
 (smartblob), etc.  



There are several options which can be used with the onstat -g rqm
command.  The following table describes these options:



style="text-align: left; background-color: rgb(15, 10, 255); width: 727px; height: 603px;"
bordercolorlight="blue" border="1" bordercolor="black"
cellpadding="4" cellspacing="2">


(Options to onstat -g
rqm)
style="text-align: center; background-color: rgb(207, 229, 255);">Option style="text-align: center; background-color: rgb(207, 229, 255);">Description style="text-align: left; width: 200px; background-color: rgb(255, 255, 255);"><nothing>
 (i.e onstat -g  rqm) style="text-align: left; background-color: rgb(255, 255, 255);">Display
information
about all queues
style="text-align: left; vertical-align: top; background-color: rgb(255, 255, 255);">SENDQ style="text-align: left; background-color: rgb(255, 255, 255);">Display
information
about the send
queue.  The send queue is used to transmit transactions to
target
servers.  These transactions might be originating on the local
node or might originated on a remote server in the case with
hierarchical routing.
style="text-align: left; vertical-align: top; background-color: rgb(255, 255, 255);">RECVQ style="text-align: left; background-color: rgb(255, 255, 255);">Display
information
about the
receive queue.  The receive queue is used hold the replicated
transaction as it is received on the target but has not yet been
applied on the target table by the datasync threads.
style="text-align: left; vertical-align: top; background-color: rgb(255, 255, 255);">CNTRLQ style="text-align: left; background-color: rgb(255, 255, 255);">Display
information
about the
control queue.  This queue is used to manage control messages
such
as replicate definitions, server definitions, start replicates, etc.
 Items placed in the control queues are always copied into
stable
storage.
style="text-align: left; vertical-align: top; background-color: rgb(255, 255, 255);">ACKQ style="text-align: left; background-color: rgb(255, 255, 255);">Display
information
about the ACK
queue.  This queue is used to hold acknowledgments before
they
are sent to the source node.
style="text-align: left; vertical-align: top; background-color: rgb(255, 255, 255);">SYNCQ style="text-align: left; background-color: rgb(255, 255, 255);">Display
information
about the sync
queue.  This queue is only used as part of the define server
and
then only to transmit the syscdr database to the newly defined node.
style="text-align: left; vertical-align: top; background-color: rgb(255, 255, 255);">SBSPACES style="text-align: left; background-color: rgb(255, 255, 255);">Display
information
about the sbspaces used to contain the stable storage of the queues.
style="text-align: left; vertical-align: top; background-color: rgb(255, 255, 255);">FULL style="text-align: left; background-color: rgb(255, 255, 255);">Display
the
transaction headers for each of the transactions within the queue which
are currently in memory.
style="text-align: left; vertical-align: top; background-color: rgb(255, 255, 255);">VERBOSE style="text-align: left; background-color: rgb(255, 255, 255);">In
addition to the
transaction headers, display information about each of the rows
 

for transactions in the queues.
style="text-align: left; vertical-align: top; background-color: rgb(255, 255, 255);">BRIEF style="text-align: left; background-color: rgb(255, 255, 255);">Display
a short
summary of what is contained in the queue.



In this posting, we are going to limit ourselves to onstat -g rqm SENDQ.



onstat
-g rqm sendq output



There are several sections in the onstat -g rqm sendq command.
 The following table describes these sections.


  1. The Summary
    Section.




    This section contains a summary of the queue.  It is further
    broken into two sections.


    1. The current summary

      This section contains the current statistics about the queue. 


    2. The historical summary

      This section contains the historical totals of the queue. It contains
      information such as the total number of transactions which have been
      queued as well as the maximum size that the queue has grown to


  2. The Progress
    Table Section





    This section contains the 'progress' of the queue.  By that we
    mean that this is what is tracking what has been sent to what remote
    server, and what has been ACKed from the remote server.  This
    section is further broken into two sections


    1. The progress table summary

      This contains describes what table on disk is used to contain the
      progress table.  It also describes how often the progress
      table is flushed to disk.


    2. The target/replicate progress information

      This contains information on which transactions have been sent to the
      target nodes and what the target nodes have acknowledged.
       Also this contains the number of bytes per target/replicate
      combination which are currently in the queue.

       
  3. The
    Transaction Section




    This section contains information about the first and last transactions
    which are in memory in the queue.  


  4. The Handle
    Section.




    This section contains a list of each of the handles which has been
    allocated to each of the users of the queue.  The handle can
    be
    thought of as a cursor into the queue.  It is used to track
    the
    position within the queue.

The
Current Statistics Section


The current statistics section is the first
section in the onstat -g
rqm sendq command
.
 It contains information about the current contents of the
queue
such as how many bytes are contained in the queue, how many
transactions are in the queue, how many transactions are currently in
memory, how many have been spooled to disk,  how many exist
only current statistics of onstat -g rqm sendqon disk, etc.  



When a new transaction is placed into the queue, the transaction is
given a stamp.  This stamp is used to maintain the order of
the
transactions within the queue.  This is a bit different from
the
commit order because the original commit order is only useful within
the context of the server on which the transaction is originally
committed.  In the case of  a system using
hierarchical
routing, it is possible that the send queue will have transactions
which originated on other servers.  That would be the case of
a
replicated transaction which must be forwarded to another node.
 In order to maintain the insert order, when a transaction is
inserted into the send queue, it receives a stamp.  The stamp
is a
64 bit integer which is maintained as part of the queue.  In
this
example, the next transaction to be inserted will be 638.



 In this example, the send queue currently contains 611
transactions of which 268 are in memory, 343 are not in memory at all,
and 42  (611-569) are only in memory.  The reason
that some
of the spooled transactions are also in memory is that we spawn a group
of spooling threads when we sense that we are getting close to running
out of memory.  The spooled transaction is not immediately
removed
from memory, however.  Instead the spooled transaction will be
removed from memory only when the memory limits are reached.
 The
reason for this pre-spooling is to avoid having to do a lot of work
when we reach the memory limits.    Once a
transaction has
been spooled and the in-memory copy of the transaction has been
removed, then the transaction is never completely reloaded back into
memory.  Instead we transmit the transaction directly from the
spooled disk copy of the transaction.



The Size of Data in queue
is
the size of the queue when combining the in-memory transactions with
the spool-only transactions.  The Pending Txn Buffers contains
information about transactions which are in the process of being queued
into the qeue.



The
Historical Statistics Section


Starting with style="font-weight: bold;">Max Real memory data used,
we enter the historical section.  This section contains a
summary of  what has been placed in the queue in the past.
 rqm -g sendq historical section


The Max Real memory data used contains the largest in memory size of
the queue.  In this case, it reached up to 1,544,060 bytes.
 The configured limits of the queue is currently configured to
be
1,536,000 bytes, so when the transaction when into the queue which
caused the limit to be reached, it triggered activity to flush the
in-memory transactions which had already been spooled.  If no
in-memory transactions had been spooled, then the thread placing
transactions into the queue would have had to also spool the
transactions.  That's why we spawn seperate spooling threads
to
perform the actual spooling.  We try to get the spooling done
before we actually have to remove the transaction from memory.



There have been 638 transactions which have been queued to this queue.
 That should match up with the insert stamp of the queue.
 Of
those 638 transactions, 569 have also been spooled.  At this
point
none of the spooled transactions have been restored.  The
reason
for that is that the only reason that the transactions were spooled is
that I brought down one of the targets.  Since the target is
down,
then we will not be attempting to restore those transactions.
 When that server is brought back up, then we would attempt to
restore those transactions and send them to the target.



Recovered transactions are the transactions which existed only in the
spool when the instanace was started.  They are not recovered
by re-reading from the logical log, but are simply recovered from the
disk storage when the engine is started.  They would have been
snooped from the logical log at some time in the past, but now are
found in the stable queue.



Total Txns deleted
is the number of transactions that have been removed from the queue.
 They may have been only in memory, only on disk in the stable
queue, or in both.  The Total
Txns duplicated
contains the number of times that we
attempted to queue a transaction which had already been processed.
 This can occur when ER is first starting up as part of the
instance startup, or as part of a cdr
start
command.  The style="font-weight: bold;">Total Txn Lookups is
simply a counter of the number of times that an ER thread attempted to
read a transaction.



The
Progress Tables Section


The progress table section contains information
on what is currently queued, to which server it is queued for, and what
has been ACKed from each of the participants of the replicate.
 

onstat -g rqm progress table section

The first part of the progress table section is a summary.
 The information in the receive queue progress table is
written to disk as part of each transaction that the datasync thread
applies.  This is not, however the case with the send queue
progress table.  Instead the send queue progress table is
copied to disk every so often.   In this example we see that
the progress table is flushed to the table spttrg_send every 30
seconds.  Another thing which might trigger the flushing of
the progress table is if over 1000 entries are dirtied.  



Below the summary section is a list of the servers and group entries
which contain the information as far as what is currently queued for
each server, what has been sent to the remote server, and what has been
ACKed from the remote server.  The term Group is a carry-over
from the 7.31 days when the replicate could be part of a replicate
group.  It should really be "Replicate" in post-7.31
instances.   The contents of the ACKed and Sent columns
contains the key of the last transaction which was
acknowledged from the remote server or sent to that server.
 The KEY is a multi-part number consisting of
<source_node>/<unique_log_id>/<logpos>/<incremental
number>.  From this we can see that the last
transaction which we sent to server 3 was transaction 0x2f/0x1934c8 and
the last transaction which has acknowledged is 0x28/0x684c8.



By examining the progress table we can discover which server is tending
to lag behind.  In this example, server 2 is completely
current, but server 3 is lagging somewhat behind.



At the very bottom of this example, we see the start of the transaction
section.  This contains the first and last transaction in the
queue which is currently in memory.



The
RQM Handle Section


The last section contains the handles.
 The RQM handle can be thought of as being much like a cursor.
 It contains the position within the queue that any thread is
currently processing.  onstat -g rqm sendq - RQM handle section

Each thread that attempts to read a transaction from the queue, or to
place a transaction into the queue must first allocate a handle.
 This handle is used to maintain the positioning within the
queue.  By examining the RQM handle section, you can get an
idea what each of the threads are doing.  For instance in this
example, we see that CDRNsA2 (Send Thread to server 2) is at the end of
the queue.  We also see that CDRNsT3 (Send Thread to server 3)
is in the process of sending transaction 1/42/0xbc4c8.  



It might be a bit surprising to see which threads have handles on the
send queue.  The network send threads make sense.
 These would be the CDRNsxxx  threads.
 However, it is a bit surprising to see that the receive
threads (CDRNrxxx) have handles on the send queue.  The reason
for this is because of routing.  When a transaction is
received which must be forwarded to another server, then the receive
thread will need to place that transaction into the send queue.
 Therefore, it is not unusual to see that the receive threads
will have a handle on the send queue.



The other handles make sense.  The grouper evaluator
(CDRGeval##) has to have a handle on the send queue because it is
placing transactions originating on this node into the send queue for
transmission to a remote server.  The ACK threads (CDRACK##)
would have a handle on the send queue because it must update the
progress table and potentially delete a transaction when an ACK is
received from a remote server.