Sizing the Queue

mpruet's picture



SizingTheQueue

Sizing the Queue


Overview


In general there is not a lot of configuration items for Enterprise
Replication.  One of the things which can be configured is the
 in-memory max size of the queues.  This is configured by the
onconfig parameter CDR_QUEUEMEM parameter.  The default value for this is 4096, which is probably too small.


There are two main queues used by ER - the send queue and the receive
queue.  Transactions which have been retrieved from the logical
log file and have been evaluated for replication are placed in the send
queue for transmission to the target nodes.  A given transaction
is placed in the send queue only once, even if it is to be sent to
multiple target nodes.  When the transaction is received on the
target node, it is placed in the receive queue where it waits its turn
to be applied.


If the ER domain is defined to be using some form or a hierarchy, it is
possible that the received replicated transaction will also be placed
in the send queue so that it can be forwarded to other nodes.  In
fact it is possible that the replicated transaction is only placed in
the send queue.  That would be the case where the transaction
might need to be forwarded, but the intermediate node is not a
participate in replication.  However, for the purpose of
this discussion, we will consider only a single source with a
single target node.


First of all, the value of CDR_QUEUEMEM is not a preallocated block of
memory which is used to store transactions.  It is a limit on the
maximum memory size that an ER queue can expand to.  If this limit
is reached then the replicated transaction may exist in the disk
overflow space within a smartblob.  Also, the value of
CDR_QUEUEMEM is not the max size of all of the queues.  Rather it
is the max size of any specific queue.  That means that if
CDR_QUEUEMEM is set to the default 4096, then both the send queue and the
receive queue can grow up to 4 meg each.



Impact on the Send Queue



When the send queue approaches CDR_QUEUEMEM size,  spooling
threads will be spawned to flush transactions to the configured
smartblob space.  These spooled transactions are not immediately
freed from memory however.  Instead we will not free the spooled
transactions from memory until the CDR_QUEUEMEM limit is reached.
  If that limit is reached, then the spooled transactions will be
freed from memory and thus will exist only in the smartblob storage of
the queue.  



When it comes time to send a transaction to the target, if the
transaction exists only in the smartblob portion of the queue, then the
transaction is transmitted directly from the spooled transaction to the
target.  We do not reload the transaction totally into memory once
it has been spooled and has been removed from main memory.



Impact on the Receive Queue



As the transaction is received on the target, it is placed in the
receive queue where it remains until it is applied by the datasync
threads.

We only spool a transaction in the receive queue if it exceeds 1/2 of
the total queue memory size.  If the receive queue should reach
the  CDR_QUEUEMEM limit, then the target will activate flow
control by causing a NIF block.  The purpose of this is to prevent
the source from sending any additional transactions until the receive
queue drains a bit.



Sizing the Transaction



In order to correctly size the queues, it is important to know how much
memory is required to store the transaction as it is in transit to the
target server.  Each row within the replicated transaction
contains fixed header which contains information about the row.
 Also there can be a series of options which contain specific
information about the row.  Probably the most common option is a
hash value which is used to support apply parallelism.  Finally
each replicated transaction will have a transaction header.  The
current (IDS 11) size of the fixed row buffer header is 52 bytes on a 32-bit machine,
and the size of the transaction header is 258 bytes.  For 64-bit
machines, the size of the row buffer header is 60 bytes and the transaction header is 292 bytes.
 The options
is a variable list and can be of variable sizes.   However, for this discussion
we will consider only the hash used in the apply parallelism which is 4
bytes.



The last part of the formula is the rowsize as is taken from the
systables table.  If we examine the customer table of the stores
database, we see that the rowsize is 134 bytes. Warehouses Schema Definition That means that
the queue memory needed to contain a single row insert transaction
 is 448 bytes.  We could therefor queue 9581 single row
insert transactions or a single transaction of 22591 inserts before we
reached the CDR_QUEUEMEM limit of 4 Meg.



Care has to be taken, however, if the replicated table contains variable
length columns such as varchars or lvarches, because it is the expanded
size of the row that is used by Enterprise Replication.  If we
examine the warehouses table (right) in the stores demo database, we see that
the table has three columns (warehouse_name, warehouse_id, and
warehouse_spec).  Two of the columns are lvarchar column types of
2K size.  



However if we examine the size of the warehouses table from systables (below), we
discover that the size of the row is 4106.  This means
that we could only perform 971 single row insert transaction of the
warehouses table or a single transaction of 1031 inserts before
reaching the CDR_QUEUEMEM limit.


The fact that ER uses the expanded size of the row can be a surprise,
especially if the lvarchar columns in the original row only contained
short character strings.  This has even more of an impact if the
environment is such that there is a lot of activity on the tables with
the lvarchars.  In such a situation, spooling might occur if the
value of CDR_QUEUEMEM is set too low.


Systable for warehouses table



Sizing Strategy



The most common strategy for configuring Enterprise Replication is
generally to try to obtain the lowest possible latency.  In order to do
that it is important to avoid spooling transactions to disk.  This
means that the CDR_QUEUEMEM limits need to be fairly large.
 Remember, ER is going to have to process all of the replicated
tables which all of the client transactions are updating.  To do
that, ER needs to be able to hold as many replicated transactions in
memory as is possible.  It might be that we want size the queue
memory based on the total allowed memory size.  As an example, for
an update anywhere system, we might want to consider 1/8-1/6 of the
total memory available since with update anywhere, we will have
activity on both the send and receive queues.  That would mean
that the total active queue memory would be between 1/4 and 1/3 of all
available memory.



For instance if the total virtual memory size is configured to be 1
Gig, then we probably want to consider letting the CDR_QUEUEMEM be
sized somewhere around 150 Meg for an update anywhere configuration or
200-250 Meg for a source/target configuration.  Don't forget that
ER will have to process all of the activity that the client
transactions are processing and to do that is going to require memory.
 Otherwise, ER will have to spool transactions, and spooling will
affect the latency of the apply.