Reproducing a Race Condition
-
We have a job at work that runs every Wednesday night. All of a sudden, it
aborted the last 2 weeks. This caused some critical data to be late. The
main ...
Unique Joins
I was wrapping up the analysis on a problem the customer had reported. I needed to do one last check for a certain set of records. So I joined two tables. I tried to exclude records that had duplicates in one of the tables. That should be easy, right? I did a GROUP BY, and ensured HAVING(COUNT(*) = 1).
Turns out I kept getting false positives in my query results. I could not explain it. I tried rewriting the query using an inner query. This still resulted in false positives. This was really annoying. It was the last query I needed to perform to wrap up my high priority analysis.
I told one of the leads that I was having trouble with this final SQL.He told me he could assist. I said come on and help me figure this out. We joined a conference call. The lead informed us that he was a SQL guru. I explained what I was trying to do. He rewrote the query too. But there was one problem. He could not get his query to compile.
Around an hour later, I told everyone I needed to go. My lead was still trying to get permutations of the SQL to compile. We were getting nowhere. I would have appreciated a little help here. I was not going to get it. In the end, the lead assigned the task to the database lead, who figured it out in no time.
What is the moral of the story? Beware of self professed SQL gurus.
Testing Concurrency
We released some big new functionality this past year in our system. There was a bunch of testing that was done. Some scenarios were difficult to accurately test. Developmewnt and testers must have skipped over some of those tricky areas. The customer found that certain combinations result in the application aborting. Not good.
I spied on the data in our production database. Saw the malformed records. Traced it back to our new development for the year. Even the comments in the code were wrong. Must have added to the confusion. I used the debugger to step through the code to get the records in a bad state. Then I made a code change, stepped through the code once more, and was convinced the fix was good.
Testers cannot run the code in debug mode. They use the release version. I told them it would be difficult to emulate the concurrency needed to see the problem. They gave it a try. A manager asked if there was anything we could do to help. His idea was to write a trigger on the table in question. We could then make a call to the Sleep procedure in the DBMS_LOCKS package. With things slowed down, he figured the testers could have a good chance to debug the problem. Somehow this did not feel right. But hey. It was worth a try.
I spied on the data in our production database. Saw the malformed records. Traced it back to our new development for the year. Even the comments in the code were wrong. Must have added to the confusion. I used the debugger to step through the code to get the records in a bad state. Then I made a code change, stepped through the code once more, and was convinced the fix was good.
Testers cannot run the code in debug mode. They use the release version. I told them it would be difficult to emulate the concurrency needed to see the problem. They gave it a try. A manager asked if there was anything we could do to help. His idea was to write a trigger on the table in question. We could then make a call to the Sleep procedure in the DBMS_LOCKS package. With things slowed down, he figured the testers could have a good chance to debug the problem. Somehow this did not feel right. But hey. It was worth a try.
The SQL Job Interview
Jitbit Software recently shared the SQL questions that ask potential employees at interviews. They give you a small schema and ask you to write the SQL queries to achieve a few goals. I looked through the questions. Some were easy. Others had me sweating. I bet I could hack through most of them. But if I was under pressure in an interview, I don't think I could solve them all. Oh oh.
It seems this test is to weed out developers who do database programming, but just click buttons on an ORM tool. You need more than that kind of knowledge if you want to tune SQL statements. Personally I don't want to tune SQL. There are specialists for that. I like writing apps that serve business needs. Yeah I try to make sure my SQL works fast enough to not be a problem. I even submit my queries to DBA peer review just in case I overlook anything.
Maybe it is time to brush up on the old plain SQL skills. The first step should be to install a free Oracle database on the computer I use the most. Used to have one on there. Then I got a new laptop, and haven't got around to doing this chore. The time is now. My SQL skills may depend on it. By the way, I don't ever use an ORM myself. I roll all my own SQL. Turns out that my SQL statements are usually quite basic indeed.
It seems this test is to weed out developers who do database programming, but just click buttons on an ORM tool. You need more than that kind of knowledge if you want to tune SQL statements. Personally I don't want to tune SQL. There are specialists for that. I like writing apps that serve business needs. Yeah I try to make sure my SQL works fast enough to not be a problem. I even submit my queries to DBA peer review just in case I overlook anything.
Maybe it is time to brush up on the old plain SQL skills. The first step should be to install a free Oracle database on the computer I use the most. Used to have one on there. Then I got a new laptop, and haven't got around to doing this chore. The time is now. My SQL skills may depend on it. By the way, I don't ever use an ORM myself. I roll all my own SQL. Turns out that my SQL statements are usually quite basic indeed.
You Are Doing It Wrong
A friend got me into playing this online game. It is turn based. Slowly I have risen through the ranks. My character level is up. Often I want to compare my progress against the other players. Fine. I go to the ratings page.
Most of the time, I get an error message stating that I have accessed the ratings page too often. Then there is some whining about the page taking a long time to generate. I guess I am supposed to come back after a few days. What the heck is this? Some noob database programmers I guess.
From what I gather, the game was hacked together with some PHP. I think I may be able to get my hands on the code. There must be a MySQL database on the back end. Now there are around 1000 player total. I imagine they are running a query that orders players by level and maybe experience.
How hard could this query be? I find there are usually only 10 players on at a given time. I bet most of them are not clicking on the ratings page. Can we somehow cache this data? Heck they could even generate a static HTML page that gets refreshed every so often. Performance problem solved.
From a database point of view, why not create a materialized view? Then refresh that view at your leisure, as resources allow. Nobody should need up to the minute real-time player rakings. And if they do, charge them a premium. Right now I scraped the ratings pages. Going to play with the data a bit. Then I need that source code.
Most of the time, I get an error message stating that I have accessed the ratings page too often. Then there is some whining about the page taking a long time to generate. I guess I am supposed to come back after a few days. What the heck is this? Some noob database programmers I guess.
From what I gather, the game was hacked together with some PHP. I think I may be able to get my hands on the code. There must be a MySQL database on the back end. Now there are around 1000 player total. I imagine they are running a query that orders players by level and maybe experience.
How hard could this query be? I find there are usually only 10 players on at a given time. I bet most of them are not clicking on the ratings page. Can we somehow cache this data? Heck they could even generate a static HTML page that gets refreshed every so often. Performance problem solved.
From a database point of view, why not create a materialized view? Then refresh that view at your leisure, as resources allow. Nobody should need up to the minute real-time player rakings. And if they do, charge them a premium. Right now I scraped the ratings pages. Going to play with the data a bit. Then I need that source code.
Database Design
One of the developers on our team was assigned to design and code a new feature in the system. This particular feature was not too complex. However in the past, this developer only did some coding. He called me up and asked me how he should implement the changes. I told him he should understand the requirements, and put together a design. He was very nervous about this. He shouldn't be. The guy has a PhD in computer science. He is was also hired as a senior developer. This should be a cake walk.
The guy came up with a design to add a column to one of our tables to track the timing of when to do some follow on tasks. That sounded good until he wanted to set the value in a trigger. Unfortuantely the trigger was on the table he was trying to modify. He found this did not work, as it was illegal in the specific trigger he was modifying. Then he decided he needed a small additional table to track the timing.
Unfortunately, the database changes went into review by an external team. They found the design lacking. This external team came up with all kinds of other designs to solve the problem. Unfortunately this was late in the development life cycle. That's where I got dragged into the scene. I took a look at all the proposed designs. Each of them would work. In the end, the main driver for sticking with the original design was that the work was already complete.
The outside team got some recommendations in though. There were be a history of the timing. There would be extra columns tracking who did what. Now that little table is getting bigger and more complex. It still will achieve the business objective. And the work is just about done. The moral of the story is that you have to practive doing designs before you can produce solid ones.
The guy came up with a design to add a column to one of our tables to track the timing of when to do some follow on tasks. That sounded good until he wanted to set the value in a trigger. Unfortuantely the trigger was on the table he was trying to modify. He found this did not work, as it was illegal in the specific trigger he was modifying. Then he decided he needed a small additional table to track the timing.
Unfortunately, the database changes went into review by an external team. They found the design lacking. This external team came up with all kinds of other designs to solve the problem. Unfortunately this was late in the development life cycle. That's where I got dragged into the scene. I took a look at all the proposed designs. Each of them would work. In the end, the main driver for sticking with the original design was that the work was already complete.
The outside team got some recommendations in though. There were be a history of the timing. There would be extra columns tracking who did what. Now that little table is getting bigger and more complex. It still will achieve the business objective. And the work is just about done. The moral of the story is that you have to practive doing designs before you can produce solid ones.
Hitting Max Users
My boss called me up this morning. The customer had a high priority issue. Our software was not allowing them to create new users in the system. The only lead I had was that our software reported a -1985 error. Now I know that number is not one of our internal error numbers. So it had to be an Oracle error. ORA-01985 means we have reached LICENSE_MAX_USERS. Oh boy.
I could not interrogate the production database. No rights. A DBA eventually got on and found that the parameter was set to 2001. Whoa. We have more than 2k users? Well the solution is to bump up the parameter. Initially the boss man said we could rev this param value up to 5000. Then after hearing the current value was 2001, he thought 3001 was a good new number.
I played around in a development database. Figured someone could issue an ALTER SYSTEM SET LICENSE_MAX_USERS = 1. Luckily a DBA came on and added a SCOPE BOTH to the end of that command to ensure it sticks after the database is rebooted. I am not a DBA. So I don't know much about the server parameter file (spfile). I just know that when we makes changes like this, it needs to stick immediately and permanently. That must be what SCOPE BOTH controls.
I could not interrogate the production database. No rights. A DBA eventually got on and found that the parameter was set to 2001. Whoa. We have more than 2k users? Well the solution is to bump up the parameter. Initially the boss man said we could rev this param value up to 5000. Then after hearing the current value was 2001, he thought 3001 was a good new number.
I played around in a development database. Figured someone could issue an ALTER SYSTEM SET LICENSE_MAX_USERS = 1. Luckily a DBA came on and added a SCOPE BOTH to the end of that command to ensure it sticks after the database is rebooted. I am not a DBA. So I don't know much about the server parameter file (spfile). I just know that when we makes changes like this, it needs to stick immediately and permanently. That must be what SCOPE BOTH controls.
TNS Names Alias Cache
Our DBA team keeps a TNSnames.ora file out on a network drive. They maintain it with all the databases anyone would need to connect to. All the developers use this copy of the file to map TNS alias names. Today one developer found out that when he first boots up, our application cannot seem to properly access that file.
The developer said he must first access the network share with Windows Explorer before any of our apps can use the share to get to the TNSnames.ora file. He thinks that maybe Windows need to cache the network share information or something. That did not sound like a sound theory. But the evidence was plain. He said the first time after a reboot, he always get a database not found error. The second time he tries, our apps work fine.
Well there may be some truth to this theory. However he went on to generalize that maybe the users of our system might encounter some problem when cached data is not present, and a database call times out. That just does not seem to have a leg to stand on though.
For starters, we put the TNSnames.ora file on the local C: disk drive of our customer' computers. So there should be no delay in accessing that file. Sure the database server is far away, accessed over the network. However I cannot believe that some delay will cause a timeout resulting in random application errors.
The developer said he must first access the network share with Windows Explorer before any of our apps can use the share to get to the TNSnames.ora file. He thinks that maybe Windows need to cache the network share information or something. That did not sound like a sound theory. But the evidence was plain. He said the first time after a reboot, he always get a database not found error. The second time he tries, our apps work fine.
Well there may be some truth to this theory. However he went on to generalize that maybe the users of our system might encounter some problem when cached data is not present, and a database call times out. That just does not seem to have a leg to stand on though.
For starters, we put the TNSnames.ora file on the local C: disk drive of our customer' computers. So there should be no delay in accessing that file. Sure the database server is far away, accessed over the network. However I cannot believe that some delay will cause a timeout resulting in random application errors.
Garbled Data
Our test team was trying to recreate some problems detected by our customers. They were having a hard time. So they took a look around at the data in the different tables in this part of the subsystem. They found the main table that drives the processing. Unfortunately, most of the data seemed to be all garbled up.
They were using SQL Developer. One of the testers wondered if this could be a problem with the query tool. That was doubtful. But hey. It could not hurt to try viewing the data with another tool. So another tester pulled out TOAD and tried to bring up the data. Most of the data seemed empty.
The testers figured that SQL Developer was just displaying all the data, regardless of how strange it was. TOAD on the other hand, was just supressing the display of weird looking data. This did not help us understand why the data was bad. It just was an interesting observation.
The data in the main table was SQL*Loaded from a file the testers were creating. Further inspection of this file showed that the positioning was incorrect. Perhaps that was the source of the weird looking data. Further research is required.
They were using SQL Developer. One of the testers wondered if this could be a problem with the query tool. That was doubtful. But hey. It could not hurt to try viewing the data with another tool. So another tester pulled out TOAD and tried to bring up the data. Most of the data seemed empty.
The testers figured that SQL Developer was just displaying all the data, regardless of how strange it was. TOAD on the other hand, was just supressing the display of weird looking data. This did not help us understand why the data was bad. It just was an interesting observation.
The data in the main table was SQL*Loaded from a file the testers were creating. Further inspection of this file showed that the positioning was incorrect. Perhaps that was the source of the weird looking data. Further research is required.
Subscribe to:
Posts (Atom)