Resolving In Place Alters - Part 2

aford's picture

In Part 1 I talked about how I knew that the prescribed method of resolving the pending in place alters in my large table by simply issuing a single large dummy update statement to update each row won't work for large tables. There are just too many problems with long transactions, locks and excessive I/O that can trash the engine.

I decided the first thing I would try would be to update every row in the table, but do it in multiple transactions to avoid long transactions and holding on to a lot of locks. Nothing too interesting, pretty standard stuff really. The logic looks like this:

set isolation dirty read;
select {+full (large_table)} primary_key from large_table;

updated = 0
for each row:
update large_table set primary_key = primary_key where primary_key = row.primary_key
updated = updated + 1

if updated mod 1000 == 0:
commit work;

if updated mode 1000 != 0:
commit work;

Next I was able to find an SQL that could tell me how many pending in place alters a table had remaining. I used this information to stop updating when all of the pending in place alters had been resolved. Not the most exciting thing in the world, but it could be helpful if you have a bunch of IPAs only at the beginning of the table. I could see this happening with some kind of transaction table that get added to, but never updated. Anyway, that SQL looks like this:

nvl(sum(pta_totpgs), 0)::integer pages
sysmaster:sysactptnhdr h,
sysmaster:systabnames t
h.partnum = t.partnum and
t.dbsname = "dbname" and
t.tabname = "tablename"

The final and most interesting thing to come out of this failed experiment was throttling the updates if the server was getting busy. Before allowing the update loop to start a new transaction I checked to make sure the following conditions were met:

  • No Cleaner threads running
  • BTScanner is not scanning an index
  • No thread waiting on a log buffer
  • No backup (logical log or level 0, 1, 2) is running
  • Update Statistics is not running
  • The load average is below a certain threshold
Mostly we are trying to avoid overloading the I/O with these checks and I would pause all work until all of these conditions are met. Sure, I want to resolve these IPAs as quickly as possible, but not at the expense of overwhelming the engine and impacting production.
As you might of guessed the performance was crap. Apparently it takes a long time to update a billion rows one by one in a way that plays nice with the other SQLs. Even if I decided to not play nice and disable the throttling I averaged a rate of 547.3 rows updated per second, or just over 21 days of running non-stop before it was complete. Brute force was just not going to cut it and I'm doing a lot of work that doesn't need to be done. I don't need to update pages that don't have pending IPAs and I don't need to update every row on a page to fix pages with pending IPAs.
I wanted to find a better way. Specifically identifying only the data pages that have pending in place alters AND only updating the minimal number of rows to force the page to be rewritten.

In Part 3 I'll talk about reading the data pages directly from disk and how I plan to use this to my advantage. In the mean time, here is an advertisement to help keep the lights on.

For easier reading of while on the can.