The Usual Suspects

We had a big database job that runs slowly at night. Sometimes it needs to be killed to keep the show running. My boss wanted me to work on this problem. He had a lot of ideas. So did I. My plan was to isolate what is slow, then figure out how to make it fast. I got overruled by the boss.

My first task was to document what this big database job does. Done. I got a document that explains in English what the purpose of each part of the job is. Then I was told to find out how each of the parts is related.

I did a bunch of analysis. I determined what pieces depend on which other ones. After some deep analysis, I figured out what are the inputs and outputs for each of the jobs. This was hard because the package that runs the job is 7k lines of code. That package makes use of a lot of other helper packages that are a lot more than 7k lines.

Now my new tasking is to figure out if we can roll any of the many cursors up into a small number of cursors. I have have to determine whether some database indexes could make the job run faster. I have the option to isolate some SQL and pass it off to some DBAs for analysis and tuning.

Who knows where all this is going to end up. We might get some performance gains. Then again, I might be just running around in circles. I would love to be given the mission to just make this thing fast. But somebody else higher up on the food chain wants to drive the task with their ideas. Frustrating. Let's hope this does not turn out to be a fail. My time is money - literally.

Record Types

Today is the last day for a developer on our project. He moved his family away, and has a new job far away. The company asked him to fix one troubling bug before he left. He was going crazy and asked me for some help. He set up some debugging in a PL/SQL package where he dumped out some contents to a log file. But they were just not making sense.

I took a look at his log. One of the variables was zero. He insisted that the source column for that variable was non-zero in the database. Then he started randomly asking whether this or that was the cause of the problem. I told him we should not guess, and inspect the place where the variable is initialized.

I found a nice routine which loaded the data from a table. It puts it into an object which is a custom record type. We select a bunch of fields into this object. The order of the select clause is supposed to match the order of records in the field. So I checked the one field in question. Oh oh. It was out of alignment.

Somebody added a new field at the end of the record. However the put the query for this new field in the middle of the SQL statement. Bamn. That's the problem. This was problably not a great design. We should make the select more explicit to put the value into fields of the record type. Maybe an idead for a redesign.

Package Analysis

We got this big PL/SQL package that contains code which runs every night. The thing does a lot of stuff such as probem detection/correction, business rule implementation, and metadata regeneration. The code has been growing over the years. This past year the job start running too long. We tried some hacks but did not get very far. We are still attempting some hacks to get the darn thing to complete on time.

Luckily I got assigned the task to analyze this monster. I started by documenting what each part of the massive job does. It was a reverse requirements analysis task. Then I started to catalog which tables each of the procedures used, and which tables got modified by the procedures. I am starting to come up with an understanding of what procedures depend on what other ones. Isn't there some software that can do this automatically for me? If not, might be time to write some.

Slow Install

My customer gave me a new laptop today. Even though I do most of my work in a virtual machine, I sometimes need to access the database right off the laptop. So I decide to install the Oracle 10g client. We are currently at 10.2.0.1.

I found the install media on a network drive. When I launched the executable, it was a while for the Universal installer to come up. Then each step of the installer was painfully slow. Here is the real sad part. I have been in the middle of this install for over two hours now. Progress bar reads 92% complete.

Not sure if this is due to the media being on a network drive. I plan to downloaded the media to my desktop if I ever need to do this again. I also chose a customer install, and checked a bunch of things that looked cool in the client. I might try that again, but will do the bare bones install first. Then I came come back later in install the goodies.

Script Failure

A developer was writing a script to set up some configuration data via script. He complained that the script was completing successfully, but the data was not getting inserted. He asked for help. His thought was that the success should mean that the records should be inserted.

Immediately I pointed out that running the script and seeing success means that the script did not throw any exceptions that were not caught. We looked at his script and saw that it was eating any exceptions. The developer floudered around some more. I later advised him to forget about the script, move over to SQL*Plus, and manually execute the SQL statements one at a time.

Bingo. In SQL*Plus, he found that some constraints were being violated and causing exceptions to be thrown. This taught me two things. You shouuld not jump to the script until you are sure your raw SQL is good. You should also not rely on some tool to execute your script if you are not in full command of what it is doing.

SQL Loader to the Rescue

I gave myself an action item to try to do some hands on SQL Loader practice. It turns out it is not that hard for default loading from file. I specified the filename, used absolute positioning, and loaded stuff into my database table.

The reference I used to learn how to do this was the SQL Loader FAQ. You should check it out. The FAQ goes over a lot of common scenarios, and how you use SQL Loader to accomplish your work. Good stuff I tell you.

So far I only tried SQL loading files with 10 to 15 records. The next step is to try to load 10k to 15k records and beyong. I got a good feeling about it. They say that SQL Loader is optimized to handle the big jobs. I should have already known this. The team on my project that loads huge files from the mainframe primarily uses SQL Loader to do the job.

Huge Script

I had a task to deal with a large amount of data for our customer. Being the programmer that I am, I got a couple files with the data. Then I ran them through a custome Java program to extract out the data of interest. Finally I stuck the data in a massive SQL script to do some work.

I did all my tests with some dummy data. Then when I got the actual data in the script, I ran one last smoke test to ensure the thing worked. It did not. The script kept hanging without inserting a single record. I called my DBA. He told me I should have used SQL*Loader. Well I prefer custom PL/SQL. Plus I needed to implement some logic for each of the inserts.

There was a tight deadline for this work to be completed. I decided to split the script up into a number of smaller scripts that actually did complete. The original script had about 300,000 lines in it. I found that 20,000 and 30,000 line scripts worked fine. So my plan is to break the big mamma up into 10 separate script. I will test them out and let you know how I fare.

New action item for self - learn how to do stuff with SQL*Loader.

Oracle 11g Express Edition

Recently I had to upgrade my company laptop to Windows 7. This was a painful ordeal that lasted a whole weekend. The upgrade consisted of a reimaging of my machine. One byproduct was that my Oracle 10g database was wiped off the machine. Ooops. I forgot to back that thing up.

I decided to reinstall Oracle after the upgrade. It was interesting to see that Oracle had an 11g version of their Express Edition free database. The latest version is 11.2, which is currently in Beta.

I was pleased with how fast I could install the Express Edition database. This thing was actually at Express speeds. I was also reminded that Express Edition does not come with much. You get a command line SQL tool. You get tools to start, stop, backup, and restore the database. That is it. Looks like I need to polish off my command line DBA command line programming.

The Trigger Project

There has been a lot of drama on our project based on performance problems we are having in production. Some people are attributing the performance issues to the number of triggers we have in the database. That sounds like nonsense to me. I deal in hard facts. There was a total lack of evidence to back up this theory. So I have decided to run some tests in our database to measure the overhead of adding triggers to DML.

I wrote a harness script that updates 10,000 records. It commits the change every update. The thing runs at a good speed. It takes around a minute to complete. The table currently has no triggers on it. Then I added one trigger that did nothing in the body of the trigger. I was shocked to find the harness took over 20% longer to run with the trigger.

I had another developer present when I ran the test. He could not believe the results either. Then he had some suggestions to vet the results. I ran the tests a couple more time. Looks like the first run was a fluke. He also suggested I run the tests locally on a database on my machine to eliminate irrelevant variables.

So next I plan to move the tests over to a local database on my machine. This way the network is not a factor. Other uses on the database are also eliminated from the equation. Next I am going to time having a lot of triggers on a table. I will implement some cascading triggers. When I am done, I will have a lot of empirical evidence to back up my opinoin that triggers in and of themselves have no impact on performance.

Triggers Did It


A job that runs at night has been taking too long every night this week. This causes pain as the system administrators call up our DBAs or help desk in the middle of the night. Obviously we have a problem. I would call it a performance problem. What makes this unique is the response I heard from this. Developers told me a manager decided there would be no new database triggers implemented due to the performance problem.

WTF? This made little sense. Who correlated our performance problem to the presence of triggers? I attended a meeting hosted by this very manager. So I asked about this new "no triggers" edict. He said that he has inspected the PL/SQL source code. He found many database triggers. He also saw triggers firings that caused other triggers to fire as well. Therefore they were suspects in the performance problem.

I said that by this logic, we would find many stored procedures in our source code. The stored procedures get exectured. The code runs slow. Therefore we should not use stored procedures. Obviously this argument does not hold up.

Rather than being an obnoxious developer who shoots down ideas, I decided to later pitch a plan to get to the actual root cause of the performance problem. We should find out what pieces of the code are taking a long time to execute. Then would should drill down into the slow pieces and find out why. If we do that and find some database triggers to blame, I agree we should optimize them.

I come to find that there is a proposal to do the very thing I recommend. Our manager clarified that we cannot add database triggers without analyzing the performance impact. I am all for that. I run timing studies all the time on slow code. We need to use logic and hard evidence to deal with tough and frustrating problems such as performance. If not, we might as well chirp "add more indexes" like any other lackey and get nowhere.

From the Trenches

I have been following a seminar on how to be a DBA. These talks give you hands on advice. They are geared towards individuals who find themselves appointed as a DBA by accident. While I am not formally a DBA, I often do some light DBA tasks. So I thought I would study up.

When you start a DBA job, you obviously need to find out what servers and databases you are responsible for. But another important discovery is to find which databases you are not supposed to manage. You should gain info on your systems by talking to actual users. If they are not available, rely on knowledge from developers and testers of the system.

I found out about a nice option in TOAD where you are execute a query against many databases at once. My tool of choice is an old copy of PL/SQL Developer. So far I only log into one database at a time to run queries.

Make sure you find out the cycles of use by the business. This will help you scope out the best time for backups of the database. Speaking of backups, make sure you can recover using the backups. Test out a restore to a different system. It is no good to do backups if in the end they do not work.

Global Temporary Tables

Our system uses temporary tables at times. It makes the data access simpler across different processes such as apps, procedures, and reports. Temporary in this sense means that a table is created, popoulated, used, and then dropped. However there is another type of temporary table in Oracle.

Global temporary tables (GTT) are implemented by Oracle. The are fast in that they have no redo or rollback. They operate in one of two ways. They either lose their data at the end of a transaction, or the end of a session. Which way is determined when you create the table.

GTTs themselves are static. The table structure is created once and remains in place always. It is the data that is transient. The data is private per session. No statistics are gathered on these tables. The data resides in the TEMP tablespace. They are cached in the buffer cache.

GTTs have been available since Oracle 8i. I would recommend you be familiar with them. You never know when you might find a table in your database that is a GTT.

Password and Login Woes

A couple things have changed in our system. The big change is a move to Oracle 11g on the back end. The security folks also changed our password validation function. The end result is that new accounts and old ones that get their password changed get locked. Doh.

Our apps rely on the Oracle 10g client on the workstation. We use the Oracle provider to manage accounts whose passwords expire. The password change dialog comes up automatically when the password has expired. New accounts get set with a temporary password which starts out expired.

Now when our users try to change their password in these scenarios, the password change hangs. Dang. Some DBAs and testers tried to do some debugging of this problem. At first they thought it was the new password verification function. However the problem persisted when those changes were backed out. Now a developer needs to get in there and figure out what is wrong.

Directory Assistance

I got a new project that I am developing. And I needed to crunch some data for it. What better way to do that than using an Oracle database? So I installed the Oracle 10g Express Edition on my laptop. I did not need any great power.

One of the first tasks I needed to complete was to load in some data from a file. The logic to parse the data was a bit involved. Therefore I decided to write it in PL/SQL. To use the UTL_FILE pacakge to access the file, I needed to create a directory object.

The web user management tool provided by Oracle 10g Express edition did not list CREATE ANY DIRECTORY as one of the grants it could assign. Therefore I dropped down to the command line to get that job done. Next my stored procedure could not see UTL_FILE. I had to log in as SYS and grant access to this package. Wow that is a lot of work to read a little file. I guess this is not your standard use of an Oracle database.

Oracle Not Available

I had to return to a project I worked on last year. The first order of business was to ensure I could still run the program. Immediately the thing died with an ORA-01034: ORACLE not available.

The program was just running sqlplus, having it execute a script that had a single EXIT statement in it. I was able to successfully execute the sqlplus statement from the UNIX command prompt. Why was the program not able to do the same without error?

I googled around and found that this error happens when your ORACLE_HOME and ORACLE_SID do not match. These environment variables seemed good. I was pulling my teeth trying to figure this out. Eventually I found that the program was sourcing a different profile. This profile pointed to the wrong Oracle home. Dang.

Adaptive Cursor Sharing

We are going to Oracle 11g in our customer's system. One of our DBAs took a look at the new features for 11g. The developers were supposed to see whether we could take advantage of these features. So far I only took a look at the first feature mentioned. That is Adaptive Cursor Sharing (ACS).

There are specific requirements for ACS to kick in. You need to be doing an EXECUTE IMMEDIATE in your PL/SQL. You also need to be using bind variables. Finally I hear that if you use too many bind variables (15 or more), then ACS will be skipped.

The optimizer may use different execution paths based on the values in the bind variables. Skewed column data may cause the execution to be suboptimal. Normally the optimizer makes use of histogram data generated from gathering statics.

With ACS, the database tries to determine which values for bind variables work best with certain execution paths. Then it would use this information in the future to choose the best execution path. We get this benefit automatically in 11g. It can be disabled. I am thinking we shall not and try to get the benefit.

Database Costs

I just looked a the pricing structure of Microsoft's SQL Server platform. Some versions mimic the Oracle cost. There is a free express version of SQL Server. The developer edition also has no cost. The minimal web version of the database costs $3k. Workgroup goes for $7k. Enterprise comes in at $27k. And the Datacenter edition is a whopping $54k.

There are other fees that might bring the total cost a lot higher. If you install a separate copy on a virtual machine, you might need another license. Here is one good way to save costs. Use multi instancing. This is similar to Oracle database instances. They all have their own CPU, memory, and disk allocated. However the SQL Server instances all share the same operating system and physcial machine. There is no extra cost for an additional instance in terms of licensing.

Here is a final tip that I learned at a seminar I watched today. Do not keep the default install options for Microsoft's SQL Server. This is especially important if you are installing the production database. I wonder if the same holds true for Oracle.

Roles of the DBA

Today I continue sharing some things I learned watching a seminar about becoming a DBA. There are two main types of DBAs: (1) the development DBA and (2) the operational DBA. The seminar I watched focues on the operational DBA.

This type of DBA needs strong customer service skills. You will be talking to a lot of people. In fact, you will be the liason between many groups. Consider yourself the point of contact for all things support related.

Take responsibility. A DBA is the guardian of the data. You must own your platform like you own a house. Uptime is your responsibility.

One major part of a DBA's responsibility is to guard against SQL injection attacks. Apps must sanitize user input. Many web apps do not. The result is that hackers can send in SQL statements, and maybe even compromise your database box.

Surprise. You are the DBA.

I just finished watching a seminar on becoming a DBA. It seems a lot of times people get drafted into the DBA role. This is not a bad thing always. The DBA is a good career path. But I also saw a lot of potential for DBA pain.

You need to keep your boss happy in any job. But the DBA must keep the user happy. Otherwise you will suffer. There are ways to set up your environment to increase the chance of succes.
Have a sandbox to try out new technologies. Put your development database on a separate physical machine. Be ready to restore a production backup to the test environment. And have a staging database that mimics everything you see in production if your project has the cash.

Performance Redux


We continue to have performance problems with our nightly processes. I extracted some SQL that could run during the day. However a manager shot that idea down. It would look too suspicious to the customer. My next idea is to replicate the performance problem.

The production operations are taking a whopping 5 to 6 seconds per action. Each action has a number of updates and inserts. I got a small (500 record) sample in development. There is was only taking a tenth of a second to complete.

I ramped up the volume to 5000 records. Things slowed down a little bit to three tenths of a second per action. Now I want to try just a bit more data to see if preformance in develpment degrades futher.

Once I can recreate the problem<, I can start with the performance debug. I think I might log the different parts of the action to see where we are spending most of the time. Then it will be time for that magic SQL performance work.

Performance Problems

We have a database job that runs every night. It takes care of all kinds of tasks. For example, it detects and resolves data problems. It also implements certain time based business requirements. Lately this job has been taking a lot of time to complete.

I traced one certain part of the job that was taking all night. It was looping through a ton of records, then doing a bunch of updates on them. The algorithm to do the updates was painfully inefficient. I replaced all of this with single SQL updates of all the records at once. The job that took all night completed in 10 minutes.

The job was doing well for a while. Then it got slow again. Another developer started to look at it. All he came up with was that there were a lot of triggers on the tables involved.

I hate when somebody tries to solve a performance problem and asks if we can add some indexes to make it faster. Ouch. The latest analysis felt like this type of solution. Sure you don't want to do a lot of work in triggers for updates that happen frequently. But triggers, lots of them, or complicated ones do not always mean slow performance.

Performance tuning is quite a deep subject. Maybe we need to send some of our developers to training on performance analysis and tuning. That might get us past the old "add indexes" solutions.

Pains from a Hack

I got assigned to work a curious bug in the system. The customer said a bunch of data was in the wrong state. I called our customer and got some more info. The best help they gave me was some examples in production.

I studied the audits and found the bad state transition. Then I poured over our database packages to find out why this was happening. Then I found the guilty party.

Somebody was making a call to some existing procedures in our packages. They found out they needed to set some data to an intermediate value first for the code to work. However when the code encountered an error, the package code left the data in the bad intermediate state.

Fail. I took out the hack. then I modified the packages to work with the new data pattern. Couldn't the original programmer have done this? Do the hard work up front. It will save everybody some pain. Next I started looking into why there was an error in the first place. That's a story for another time.

Making the Connection

I got a call for help from a fellow developer. She had installed the Oracle 11g client. And she could not connect via SQL*Plus or SQL Developer. So I went over to see what was wrong.

The first problem was that she was using the SQL*Plus that comes with the client. This is different than the old Windows sqlplusw. You only get asked for username and password. If you only enter those two, SQL*Plus assumes you are trying to connect to a local database on your workstation. It turns out she was getting a TNS protocol adaptor error.

I had her type in username@Dbname. That did the trick. She was able to get into SQL*Plus. Then we looked at her connection in SQL Developer. She was using the default that also tried to connect to localhost on port 1521. She typed in the actual host name and port. Then she was good to go for SQL Developer. She could have alternatively changed the connection type to TNS. We set up a TNS_ADMIN variable that points to a "tnsnames.ora" file. That would have made life easier.

Just now I got another email for help from a different developer. He was wondering whether we have PL/SQL Developer on our new machines. I told him to try out Oracle's SQL Developer. Let's see how he fares.

No Place Like Home

After having installed the new Oracle 11g client, I wanted to put PL/SQL Developer on my machine. Ooops. I forgot that my PL/SQL Developer required a bunch of passwords for installation. I dug up my printout with all those passwords. Once I installed PL/SQL Developer, I launched it hoping to execute some queries. No luck. It kept complaining that SQL*Net was not installed.

This seemed vaguely familiar. I googled this problem. Some people recommended that I hack up the Oracle entries in the Windows registry. That did not seem too encouraging. Instead I tried applying some settings in the PL/SQL Preferences tab. I hard coded the Oracle Home directory in there. Still no help.

Finally I did some more googling on the problem. Someone smart advised me to put in the full path for the OCI Library. Bamm. That did the trick. I am recording my experience here so I can come back the next time I install PL/SQL Developer and maybe a new Oracle client.

First Look at 11g

We finally got around to installing the Oracle 11g client for our new development. This is an upgrade from Oracle 10g. Our first test was to try to log into the 10g database uses the 11g client. But we could not find the Windows SQL*Plus client (sqlplusw.exe).

Doh. Oracle is no longer shipping this product. They want you to use SQL Developer. Weak. I am contemplating bringing the one from the 10g client. Come on Oracle. We like sqlplusw. Yeah they ship the normal SQL*Plus. The DBAs use it. But us developers want our Windows version.

Script Mystery

I got to work and had a lot of email to deal with. Another developer dropped back and said he was having problem with a script he wrote. It was supposed to insert 180+ records. But only 15 records were working. I told him I was busy, but he should run the thing from SQL*Plus to ensure some tool as not confusing him.

Later the developer returned and said he needed help badly. People were calling him asking him for the completed script. I took a look and indeed only 15 records were in the table after he ran the script. I went through he whole script and found an exception handler at the end. It logged and error and exited.

The developer thought the script was running successfully. But the exception handler was masking the true error. A primary key constraint was being violated. As soon as he got rid of the duplicate inserts in his code, the thing worked fine. You can start with another sample script to speed development. But make sure you know what you are starting with.

Could Not Find Program Unit Being Called

I was told to write a stand alone script to clear out a backlog of work that needed done. It was caused by a slow nightly process that had to get aborted. I coded a new procedure in one of our packages. My script was very simple. It logged the start and end times of the script. The only other thing it did was call the stored procedure with the correct parameters.

Now I wanted to make sure the performance of the stand alone script was acceptable. So I looked in the log to see the start and end times. I was shocked to find tons of ORA-06508 errors being logged by one of the triggers in the database. Now this error means some stored procedure could not be found, or was invalid. This darn trigger was calling all kinds of procedures in many different packages. I guess I could have recompiled them all.

I did find a gem of an idea on Akdora's blog. Instead of just logged the error message in the exception handler, you can store the output of DMBS_UTLITY.FORMAT_ERROR_STACK. This tells you exactly what object is missing or invalid. Thanks Akdora. You helped me get the problem solved and still get home in time.

PL/SQL Review

I just went through a recap of the PL/SQL programming language. It was good. PL/SQL is the programming language for the Oracle database. It can improve application performance but compiling stored procedure code into the database.

PL/SQL is arranged in blocks. Blocks with names are procedures or functions. You can run PL/SQL from the Oracle SQL*Plus tool. You can also run it through an IDE like Oracle SQL Developer.

I hear that new developers continue to come into the PL/SQL programming community.

Oracle Database Firewall

Oracle has released a database firewall product. It analyzes SQL to determine whether to block or log the activity. This functionality comes from F5 Corporation, with which Oracle has struck a partnership.

Normally this is the territory of database activity monitoring products. Companies with these products are downplaying the security of the Oracle database firewall.

There are ways to get around the firewall. For example, our UNIX guys log into their UNIX accounts on the machine that hosts the database. Then they use a local connection to the database. Once they do that, all they SQL bypasses the firewall. Ooops.

Users are Blocked

I was working on some low priority customer problems for a manager. Then all holy heck broke loose. I was getting emails and calls like crazy. The system supporting all our customers was slowing down to a crawl. That should not be possible as we have massive hardware.

The DBA team was on hand to monitor what was going on. User row-level locks were in contention. Our applications were hanging. The DBA teams used some tools to identify one of the first locks that happened in the morning. I knew the business logic behind the SQL.

We got in touch with the user. She killed her application that had been hung all morning. This automatically released the locks and the system returned to a normal state. The lead DBA asked us developers to look at the locking strategy used by the application. He recommended a potential redesign of the app. I took a note, then ran off to the next set of emergencies.

VirtualBox

A friend of mine asked me if I knew about Oracle's VirtualBox software. I was ashamed that I did not. I consider myself an Oracle developer.

VirtualBox is a free open source product released under a GNU license. It does operating system virtualization. You can run multiple copies of virtual operating systems at the same time.

To start with, you need a host operating system such as Windows XP or Windows 7 for the Windows platform. Then you can run all kinds of virtual guest operating systems with VirtualBox.

Why would you want to do this? Well you can test many platforms with just one physical host platform. VirtualBox also can simulate SMP up to 32 cores (which are all virtual). You can also take snapshots of the o/s state and restore to them.

I like the idea of having a snapshot I can revert to if my o/s gets corrupted by some nasty virus.

Keep Business Rules out of Triggers

My old team needed some assistance with a problem that was happening in production. Rows from a database table were mysteriously disappearing when the users ran a certain application. The developers swore that their code did not delete these records.

Since I trust nobody, I scanned the source code of all the database triggers in the system. Sure enough, there was a trigger on another table that deleted the records when a certain field got set to a value. Oh snap. This is just poor design. That's not what triggers are for.

There are two lessons to learn from this. One is to practice solid design and keep business rules out of the database triggers. The second is that the client server developers need to keep an eye on the code in the database triggers.

Promotion Woes

I had to fix a problem they were having using a back end process in production. Seems a processing command was resulting in an Oracle error. I investigated and found that the key stored procedure was commented out of the package. So I reenabled the procedure and thought we were good to go.

I got a call from our DBA. He said my code changes would not work in some of the other database schemas. When I got in to work, I found that the other schemas were missing a key column in the staging database table. That should never happen. We got a database configuration management issue.

So I thought we could just add the column everywhere. Some more analysis showed me this would not correct the whole problem. In the end I decided to recode the stored procedure to use another technique to derive the necessary data using some assumptions. This whole episode made me a bit leery about the state of our databases.