Saved by Documentation

This year my team was tasked with making some complex changes in our system. These changes needed to be made deep in our PL/SQL packages that run as a part of our nightly process. Since I am the one who has been around the longest, and I am also one of the few who actually know how our nightly process works, I got assigned this task.

Our requirements team documented the needed changes as best they could. And I questioned them mercilessly until they got all the details from the customer. Then I went off and coded up a number of changes to meet the specification as I understood it. Unfortunately I made some incorrect assumptions about the values set by some other parts in the system. As soon as this code hit production, the trouble tickets started coming in.

As this was my code, I jumped in and determined what was wrong and coded up some PL/SQL code changes. Our process requires us to conduct unit testing. So I doctored up some data to cover all the test cases. There was only one problem. The code I implemented is only supposed to be run on Wednesdays. And today is a Thursday. No problem. I just modified the code to run every day while I did my unit tests.

Another part of our process is to document all changes so that they can be peer reviewed. Often times this seems like unnecessary overhead. But today it saved the day. As I was generating the documentation, highlighting the portions that I changed, I noticed that the code was written to run every day. Oops. Luckily I caught it before peer review, and more importantly, before we shipped this code to Production. If that had gone through there would have been all kinds of fireworks. We do have other safeguards such as independent testings. But I am not sure they would have caught this given the rush on the fix.

I can safely say that I was saved today by our standard development process. Maybe I also need to look into other ways to test "Wednesday only" functionality without hacking up the code. What do you think?

Development Versus Run Time

Our customer had some high priority problems. The original developer working on the problem was not making enough progress. So the problem got reassigned to me. I had a late breakthrough on the problem. Apparently one of our application's metadata tables got out of sync. So I needed to write a script to fix the data. The challenge was that it was very late when I made the discovery. And the customer wanted the fix now.

Here are some more details on the problem. We have a main table with read-only data. It has a lot of records (maybe 30 or 40 millions records and growing). The application makes use of a second database table containing meta data with around 10,000 records. Unfortunately there were multiple problems with the meta data. Some of the records did not belong in the table. And other records were missing.

Given such small amount of time to write a fix, I just hacked out a brute force method. I removed all the meta data records. Then I went through the 40 million records in the source data and reconstructed the meta data table. I did some quick performance tests and they looked ok. So I am planning to ship the fix as is. The main driver of this design was the limited development time. Had I more time to work on the script, I could have built in some smarts which only added and removed a small amount of records. However this would have taken much longer to code and test.

This is usually the trade off in software maintenance. I hope the script runs fast enough when it gets deployed to Production.

Fixing a lot of data

We have some PL/SQL code which computes a lot of values on a daily bases. Right after the SQL*load, the PL/SQL goes to work and performs the computations. Then the client applications use these computed values on the display screens. Nothing special here.

I got a trouble ticket stating a new computed value was not being set. After checking the production data, I found that this was true. Found out that the wrong version of the package was compiled into the database. So I spoke with the team who controls the PL/SQL package install. Sure enough the instructions were correct in thier release notes. But the wrong version was compiled in. The immediate problem could be fixed by compiling the correct version of the code. But the total fix required going back and recomputing the values for all the data loaded so far. We got the computation wrong for about 15 million records.

My first cut of a script to fix the existing data was to run a cursor through the 15 million records. Then I would execute a bunch of logic to figure out which records needed updating. Then I would perform the update for each record. After completing version 1, I worried that this script might take a long time to run. So I spent some time looking at the Production data. And I was able to determine that I could put some conditions in my cursor that would bring the total record count down to about 40 thousand. Now I was on the right path.

Unfortunately my first set of unit tests on random data showed I had some bugs. My code that retrieved the data to determine the computations made some wrong assumptions. So I was trying to SELECT some data that had no records. In my early days as a programmer, I would just first SELECT the COUNT, then SELECT the data if the COUNT was greater than 0. But this is doing double work. So instead I coded a CURSOR which opened and fetched. If I got any data from the fetch I did my computations. No need for a separate COUNT. I checked for the existence of the data, and retrieved all in one step. This can speed things up when you are dealing with a lot of data.

40 thousand records is not huge. But it is big enough that you want to consider performance enhancing features. The only drawback to my work was that it took up a good deal of development time. Now is the busy season for my customer. So I had to wrap up this task and move on to the next problem to solve. I am glad that I forced myself to spend a little extra time to get the solution right. In fact while looking at the Production code to get some ideas for acceleration, I noticed some other related problems that I solved in the same script. I'd say I earned my pay today.