Mysterious Double Instance Hampering Performance

I study the existing code base. Confer with a colleague. Then I determine the optimal plan to change the functionality to load only a slice of all the data. Feeling good about this. Write some helper functions. Make the minimal amount of changes to existing functions. Then I am ready to test.


I run the main stored procedure. Then my VPN drops. Should not be a problem. Just run my stored procedure again. It takes forever. I start to doubt my changes. Maybe I am now keying off a column that does not have an index.


Eventually it is time to leave work. Query is still running. I leave it run to see if it will ever complete. Just as I am sitting down to eat dinner, I get a call. My queries are slowing down the whole database. WTF? Now I must point out that this is Greenplum/PostgreSQL and not Oracle.


Yeah. I can kill one query. But every body says I have two queries running in parallel bringing everything to a halt. Then it hits me. That run when my VPN dropped. My client lost the connection. However the query must have kept running. I left it run now since it will not be in contention with the second query.


Let's hope that solves the performance problems. If not, I am sure a DBA can kill my jobs for me. At least I have a solid lead on my own alleged performance problems.

Formatting Trouble

I got a call from a guy I used to work with. He was producing a new report to email the customers regularly. He knew what he wanted the report to look like. Unfortunately, the actual report was not coming out the way he wanted. He asked me for some pointers.
 
The specific problem was that the text in the report was wrapping to the next line at unexpected locations. I asked him to determine whether the tool he used to view the report was also doing some wrapping of its own. We looked at the report in binary format. We saw some carriage return characters inserted. Also saw a bunch of spaces inserted. Both contributed to his problems.
 
I have done reports using SQL before. It takes some tweaking to get your reports to line up the way you want them to. Sometimes you need to set the width for some columns in SQL*Plus (the report was spooling the output of SQL*Plus executing a SQL script). Other times you need to set some parameters related to wrapping.
 
I don’t work on that project any more. Therefore, I did not go and solve all the problems. I did review and confirm that there were no blatant problems with the SQL For formatting problems such as these, you got to dig in and understand how output is formatted by SQL*Plus. You can’t expect it to look just the way you put_line() the data.

Saved by the Notes

I attended a meeting called by the new manager. He compiled a list of things he thought we going out in the next release. He wanted to make sure all the developers were on the same page as him. There were a bunch of developers in attendance.

I had actually prepared for the meeting. I went over every item I was doing for the next release. I new my statuses. When it came to my turn, the manager asked me about my items. Then he asked about an extra item. It was not on my radar. So I said I needed some time to research it.

There were some surprises like they were taking the database down on the last week of our development schedule. So the pressure is on to deliver early. I looked up the extra item in our ticketing system. It rung a bell. A tester wrote a defect against something I pushed out in the last release.

So I went back to the original ticket that backed the work I did last time. There in the notes were all the details. I documented exactly what I had done, where I left things when I went on vacation, and the details of the progress made while I was on vacation.

Somehow the info did not get forwarded into the new ticket. This issue was a non-issue. Thankfully I write everything down into the notes section of my tickets so I don't have to remember these pesky details.

Materialized View Review

I decided to brush up on materialized views this week. It is easy to gloss over all the different options available during creation. I found that the real learning happens when you attempt to actually create some of these views. Let me share some insights.


You can specify FOR UPDATE when creating the materialized view. That means you can issue an UPDATE statement on the materialized view. Not sure why you would want to do that. Those changes only get made to the materialized view. They do not propagate back to the source database tables. The next refresh will wipe out your updates.


You also can specify a REFRESH FAST option during creation. This means that only the source records that changed will cause an update in your materialized view. This might provide a great performance improvement. However you must first create a materialized view log for your source table before creating this type of performance view.


Finally there are the START WITH and NEXT options. These tell Oracle when to do the first and subsequent automatic refreshes of your materialized view. I offer this hint: The changes to the source tables need to be committed before they will propagate down into your materialized view.


Hopefully these hints can clear up some misconceptions and save you some time when dealing with your own materialized views.

Mastering Oracle PL/SQL

I received some newsletter from my company. It highlighted the book Mastering Oracle PL/SQL. I think this is targeted for me since I said my specialty is PL/SQL. My company has a subscription service with Books 24x7. So I can read the book for free.

Chapter 1 says to set up the employee sample database from Oracle. I was to go to ORACLE_HOME/sqlplus/demo and run the demobld.sql. Well I got Oracle 12c on my Windows box. But when I navigated to that directory, there was no such script.

Luckily, the Google search engine knows where I can get a copy. Here is one courtesy of the University of Vermont.

Perfomance for Dropping a Column

I am working on some experimental routines to analyze hash functions. The big idea is to generate a lot of hashes. Of course I am storing them in a database table. For a while, I have been going back and forth trying to determine how to identify each hash.

Initially I think I just has a key that was generated from a sequence. Then I moved to multiple keys that were related to the input data. However I am changing how I generate that input data. So I thought I would drop those columns. Oops. It is taking a very long time. I only have 20M rows in the table. What is going on here?

Now I realize that I am going to regenerate all my hashes. So those existing 20M rows are just going to be truncated anyway. I guess I should have truncated first, then dropped the columns. You live, you learn. Still want to figure out why it is taking so long to drop a column here. Yeah there are a bunch of rows, and some of the columns may be big. But come on.

Package Access Mystery Solved

I have been playing around with the DBMS_CRYPTO package, trying to generate a lot of hashes. I want to study the distribution of hashes for different inputs using assorted hash functions. The HASH() function seemed easy enough. Just pass two parameters: the input your want to hash, plus the hash function you want it to use. Unfortunately, I ran into trouble.

The DBMS_CRTPYO package defines six hash types in its packages specification. For example, HASH_MD4. But when I tried to specify MD4 using this variable, I kept getting PL/SQL: ORA-00904: "SYS"."DBMS_CRYPTO"."HASH_MD4": invalid identifier. What was going on here? I was explicitly granted EXECUTE on the DMB_CRYPTO. I should be able to see that variable.

I searched the web and was about to give up. Just hard coded the hash function values. Then I found the answer on TalkAPEX. My problem was that I was trying to use the package variable from SQL*Plus. Not allowed. You can only get access to the package variable from PL/SQL. Aha!

So now I just wrote a procedure to run my hashing. The procedure, via definer's rights, has EXECUTE on the package. And it can access the package spec variables. Now I am good to go. Only thing left to do is feed these hashes a whole lot more inputs.

A Little Bit of Crypto

I have been trying to figure out to "collision resistant" some of these standard hash functions are. It is a tough concept to get my head around. I figure what better way to understand than to do some hashing. And let's put the result in a database table so I can analyze it from different angles.

The first thing I needed to do was grant access to the dbms_crypto package which is owned by SYS. Not a problem. Then I called the hash function. Seemed easy enough. I wanted it to do a SHA256 on my plain text. For some reason, I could not get the function to recognize HASH_SH256. It is supposed to be defined in the dbms_crypto package. But Oracle kept complaining.

I needed to get this project moving. So I just hacked the hash type and passed in a 4. I have it on good authority that 4 means perform a SHA256. Eventually I should figure out what the problem here is. Now I got a bunch of hashes. But they are in raw format. Not too good to look at.

After running the raw hashes through rawtohex() plus a to_char(), I am good to go. I was surprised that all the hashes have the same length. I know the upper bound is supposed to be 256 bits wide (i.e. 64 characters). However I thought some of the hashes might be shorter. Nope. all were the same size.

What input data did I pass to the hash function? Heh. I grabbed a bunch of sentences from the book Pride and Prejudice. That is a story for another blog post.

Jobs in Information Technology

I saw a post on a board asking for some help. The dude wanted to know what kind of skills or certs would be good to get into the Information Technology industry. He also wanted to know what kind of jobs he should shoot for. He thought maybe a help desk position. While that is an option, the pay is at the low band of the industry.

Here is a relevant observation for today: Getting involved with security is a good thing. Network security is even better. These are hot right now. It seems to help if you have a government security clearance. Provides some protection against offshoring. Another general truth is that you always need to be learning in this field. Stuff changes a lot. New stuff pops up.

I heard a rumor that a business analyst position is the sweet spot. Money is really good. Plus the work is "easy". Well that is what I heard. Not sure I agree. Yeah the money is good; it is a lot higher than help desk work. But I doubt the job is an easy one. In fact, it might be one of the harder ones. Let's just say it is hard to be a good business analyst.

Regarding pay, a help desk position may get you $35k and top out around $50 to maybe $60k per year. Decent. But not great, especially in IT. I was surprised that some guys who were gainfully employed turned down contracting rates as high as $90 per hour. That sounds like a lot of money to me. But maybe you could earn more if you are really good.

Moving Forward with Many Unknowns

I used to be part of a big team. Then I got put on a big assignment that lasted over a year. I was on a much smaller sub-team. The new development is over. That project is in maintenance mode. I have been tagged to do general maintenance of that project. My old team leader got wind of that. He assigned me a high priority maintenance task. Unfortunately it dealt with a code base I don't work with on a regular basis.

So I dug in. Traced how the data was getting loaded. I asked the ETL team why there was data missing from a table they populate. The only answer I got was that was what they received. Not too enlightening. The guidance I got was to try to find the data somewhere else. Luckily the analysts had a lead on a source where I could find the data.

With the new source in my hand, the fix should be easy, right? Nope. I got to modify this big system. I decided to just come up with a hypothesis of how I could achieve this. Then I went to research how that could be done. Halfway through, I found out that I was barking up the wrong tree for half of the solution. Okay. Time to regroup.

Once I figured out where I needed to make the changes, I found that I had the wrong version of code to start with. I was working with something that was two years old. I backed up and downloaded all the code from our source code repository. Was able to match up the version of code that was most recently deployed to production. Now I am cooking with gas.

The are a couple good lessons here. Sometimes you got to try something out when you don't know all the answers. Also, by doing some hard work, you can go figure stuff out for yourself without having to bother other busy people. Finally I learned that there is nothing like making changes, running the code, and seeing the results work. Very satisfying.

Salary Comparison Failure

Read a post that stated top bug bounty hunters make 3X the salary of average developers. Umm what? Who cares what those top people make? You got to compare apples to apples. In this case, compare your average bug hunter to your average developer.

Or we can do it the other way around. The top developer made 10,000X the amount the average bug bounty worker did. Meaningless, I know. I do realize anyone can publish anything they want. But let us try to avoid the amateur hour.

Curse of the Business Analyst

I got a call from a dude on our team who usually is our customer facing tech guy. He had a business analyst from our team on the line. The business analyst was fielding comments and complaints from one of our customers. The customer had provided us with some input data. However when the customer used the reporting system, about 1000 records were missing.

I ended up working with the business analyst to help figure out what was going wrong. The input data got sent to our ETL team who loaded the raw data in. Then we got some jobs that process that data and format it for reporting purposes. Finally, we have a reporting structure on top of the formatted data that the customers use.

So I explained all this to the analyst. Showed how a couple queries could be run at the staging and formatted levels to determine where the deficit was coming from. In the end, we determined the point at which the records were disappearing. At that time I excused myself. I had to get back to my day job. However I recommended that they have the customer officially submit this in a trouble ticket.

Later my team lead called me up. He wanted to review this discrepancy with me. Luckily I was very familiar with the scene. He had a guess as how he could solve the problem. And his justification was that the business analyst provided him the requirements. That seemed circular. The business analyst came to me and really knew very little. Now they were being used as the authority on this?

I told my team lead that we had to understand what the business customer actually wanted here, and to make sure their needs were met. Yes we needed to resolve loss of records. But you just can't hack away until some counts match. You got to know what you are doing. So I said let's identify the source of the problem, and also ensure we know how to test that business needs are being met after we make some changes.