Fixing a lot of data

We have some PL/SQL code which computes a lot of values on a daily bases. Right after the SQL*load, the PL/SQL goes to work and performs the computations. Then the client applications use these computed values on the display screens. Nothing special here.

I got a trouble ticket stating a new computed value was not being set. After checking the production data, I found that this was true. Found out that the wrong version of the package was compiled into the database. So I spoke with the team who controls the PL/SQL package install. Sure enough the instructions were correct in thier release notes. But the wrong version was compiled in. The immediate problem could be fixed by compiling the correct version of the code. But the total fix required going back and recomputing the values for all the data loaded so far. We got the computation wrong for about 15 million records.

My first cut of a script to fix the existing data was to run a cursor through the 15 million records. Then I would execute a bunch of logic to figure out which records needed updating. Then I would perform the update for each record. After completing version 1, I worried that this script might take a long time to run. So I spent some time looking at the Production data. And I was able to determine that I could put some conditions in my cursor that would bring the total record count down to about 40 thousand. Now I was on the right path.

Unfortunately my first set of unit tests on random data showed I had some bugs. My code that retrieved the data to determine the computations made some wrong assumptions. So I was trying to SELECT some data that had no records. In my early days as a programmer, I would just first SELECT the COUNT, then SELECT the data if the COUNT was greater than 0. But this is doing double work. So instead I coded a CURSOR which opened and fetched. If I got any data from the fetch I did my computations. No need for a separate COUNT. I checked for the existence of the data, and retrieved all in one step. This can speed things up when you are dealing with a lot of data.

40 thousand records is not huge. But it is big enough that you want to consider performance enhancing features. The only drawback to my work was that it took up a good deal of development time. Now is the busy season for my customer. So I had to wrap up this task and move on to the next problem to solve. I am glad that I forced myself to spend a little extra time to get the solution right. In fact while looking at the Production code to get some ideas for acceleration, I noticed some other related problems that I solved in the same script. I'd say I earned my pay today.