Work Smarter not Harder

We have large data sets in my current project. Every year tons more data is loaded into the system. So we only keep the majority of data for 4 years. After that, we get rid of it. There are some specific items in the data that the customer wants to keep around forever. So we identify those important records on a yearly bases, and "archive" them.


Archive in this sense means to copy the important records to another table. That other table mimics the structure of the original table. We do run into extra work when structural changes are made to the original table. We need to ensure that those changes are made to the archival table as well. This undertaking is done once a year when the archive process is run.


Right now the archive time is coming due soon. So I am working on this task. I need to identify all the changes made to the original table. Actually there are a total of 37 original tables. Who knows what kind of changes were made in the last year. How can I be sure I account for all of them? Well previously this was a manual process, reviewing the changes made during the year. Now I am turning to the data dictionary.


It is easy to find newly added columns. I just search for columns in the original table that are missing from the archive table. Little more involved when data type changed in the original table. I must find where the type change to an entirely different type, or where something like the scale or precision changed. The data dictionary knows all. Just takes a SQL query to unearth the answers.

Know What You Are Doing

An analyst scheduled a meeting to talk about a problem discovered in production this year. They changed the format of some of the root level flags stored in the database. However, the fields that depend on those flags was still getting set using the old logic. Those fields in turn drive the setting of other fields that the customer uses in their reports.

The game plan was to update the function that sets the root level flags. We also will need to correct al the data previously processed incorrectly. Easy enough. A new guy was assigned to work on the root level flags issue. And I was tasked with updating the customer facing data.

The only tricky part of my fix was that it depended on the other fix being executed first. I asked the new guy to let me know when his fix was done so I could add a dependency to it. The new guy did contact me. He asked if I knew how to do a data update. I gave him an example script. It is a template I use when we update data in tables using some other tables as source.

The new guy too ka shot at his task. He asked me to review his work. Initially I saw some SQL errors in the code. He obviously did not run this code yet. I let him know about those problems. Then I noticed something weird. He was updating data in a table based on other fields in the same table. That did not seem like how we normally do things.

Upon closer inspection, I found that the dude did not understand which tables all the columns were in. I asked him to take a step back and look at the problem from a big picture perspective. I pointed out that normally data flows from table to table, not from a table to itself. You should understand what it is you are doing before trying to construct a technical solution to a problem.