SQL*Loader Refresher

I have started searching for an Oracle development job. Optimally it will be one where I can do a lot of PL/SQL coding. Some jobs out there require experience with SQL*Loader. I have used the tool in the past. But not much. Time to refresh my memory.

SQL*Loader will take a data file as input, and put the data in your database tables. The main driver of reading in that file is the control file. The control file can specify a fixed datafile format. That has the best performance. Or it can specify a variable length. The length of each line can vary, with the size being set in the first couple character of the line.

You run SQL*Loader with the sqlldr command. You pass it a username and password. Plus you give it the name of the control file. It will generate a log file with the same name as the control file, but with a log extension, by default.

SQL*Loader can also generate a bad file. This will contain records rejected by SQL*Loader. It can also generate a discard file. That file has input records from the datafile that were not selected by the constraints specified in the control file. These are optional files to be generated.

There are a few methods that SQL*Loader can use to load data. The first is conventional path loading, where data is copied to a bind array, then inserted into the database table. The second is direct path loading, where blocks are built and written directly to the database. It is fast. The third is external table loading.

If you want to load an Excel file, which I sometimes do, you need to export it to CSV format first. There is a way to put the actual data being loaded right in the control file. This eliminates the need for a separate datafile. You just give them command "INFILE *" in the control file. Then you prepend your data section with the keyword BEGINDATA.

Selective loading of records can be accomplished with the WHEN clause. Just note that you can only use AND in the parts of that clause. OR is not allowed. You can specify the batch size using the ROWS parameter. You tell what table the data is going into using the INTO TABLE clause. That is followed by fields. and optionally their types.

Although I have never done it before, SQL*Loader can read data into collections such as nested arrays or VARRAYs.

LISTAGG

I read this month's edition of Oracle Magazine, skimming over most articles. But I always stop and read Steven Feuerstein's column on PL/SQL intently. His listing 1 this time around made use of something call LISTAGG. It had a weird syntax. I honestly had never heard of it.

Turns out LISTAGG is new in Oracle 11g release 2. It does something called "string aggregation." That is a fancy way of saying concatenating values from multiple rows into one big string. Hmm. Sounds like it could be useful in certain scenarios.

So you give the LISTAGG a MEASURE_COLUMN. That would be the column whose values are getting concatenated into a string.  You also provide an ORDER BY. This will be the order in which the values are put into the concatenated string. You can specify the delimiter which separates values in the concatenated string.

I got to try better to keep up with new stuff from later Oracle releases.

The Wrong Way

I got a call for help from one of my teammates today. He was supposed to deliver a data correction script. Unfortunately, the thing was not compiling. He wrote up this huge script. Then he tried to compile it. Of course it did not work.

I found a slew of problems. There were syntax errors. Wrong keywords were used. Semicolons missing. The poor guy is not strong in PL/SQL. So he could not decipher the error messages. I helped him for a while, getting past tons of bugs quickly.

This all could have been avoided. Don't just blindly code and hope to eventually compile the thing. Compile it after you write your first routine. That way the code delta is small and you can hone in on the problems.

I write PL/SQL pretty well. However I still start checking for errors as soon as I start coding. I even stick in a NULL in place of the code and make sure the function and procedure declarations are correct before I implement the first line of business logic. Take it from a pro buddy. Test first.

The New Normal

I saw a job that required familiarity with the first and second normal forms. Now I never properly studied database theory. But I got a lot of on the job training. A whole lot. Still I figured I should at least know some terminology. So I googled around.

The goals of normalization are to reduce duplication and to put data where it belongs logically. Every rule in normalization is a normal form. Edgar Codd (E.F. Codd) came up with the first three normal forms by himself. These are the only forms most apps will ever need.

The first normal form aims to get rid of duplicates. Do not arrange tables for any specific performance. Make sure attributes have only atomic values. Mae sure attribute values are unique.

The second normal form requires the first. In addition, attributes must be dependent on the whole key to the table. Separate tables should be created for values that go with multiple records. Create foreign keys to reference the primary keys in the tables.

The third normal form requires the second. In addition, attributes must contain a detail about the entire key (especially in the scenario of composite keys). Put another way, there should be no fields that do not depend on the key. This is the form that people usually mean when they say normalized.

There is obviously more to database normalization than these high level descriptions. But this is a start. I think the real value comes from having to design databases over time.

Lot of Notes

I have been working on designing solutions to some new customer requirements recently. Some requirements are written based on the existence of some notes on an artifact. I guess that is the only way they could specify the types of artifacts that need certain processing. Okay.

My solution was to run a query to find out if those notes existed. I did not have the exact text of the notes. But I had a bunch of examples. I figured I could do a query where the note text was LIKE some pattern I could match. My guess was that I could throw the query over to our DBAs, and they would tell me whether the performance would be acceptable.

A colleague was working on some other new requirements. He had the same notes to search for. His solution was to just read all the notes into memory or some collection. They he would implement the search for the patterns he needs. Now this might be a way to get the job done fast. I prefer to let Oracle handle the searches. We shall see which method has the superior performance.

Baby Steps

A friend of mine had some trouble transitioning from her community college to a four year one. Right now she is in limbo. Looks like she is reading books and playing video games. That is fun for a while. But I think she should be working on personal development projects.

I decided to work on some side projects with my friend to motivate her to start writing code again. She had an idea for a fiction database. She tried to do it once. But her queries were not working and she gave up. That is so not like her.

My first idea was for us to produce a minimum viable product (MVP) for her fiction database. I helped trim the specs down to a bare bones app. I cooked up a simple single database table to hold the data. Then I reviewed the SQL needed to get records of interest from that table. That's when I saw my friend had some trouble constructing the WHERE clause.

With the SQL confusion resolved, my friend banged out the app. I was happy. We decided to write the app in PHP and use MySQL on the back end. Next step is to modify the app to work with her more complicated database schema. We have the basics down. Moving forward should be easy.

ODP.NET

Just read a article that describes how to use the latest Oracle Data Provider for .NET, also known as "ODP.NET". You must know that there are type types of this provider. One is the managed driver. The other is the unmanaged one. I was interested in the unmanaged driver.

You need to have, at a minimum, Visual Studio 2010. At work I run with Visual Studio 2008. But at home I use Visual Studio 2010 for school. So I am good to go to try things out. Apparently you also need to download Oracle Developer Tools for Visual Studio, also know as ODT.

The example I read had me setting up my Visual Studio project to add a reference to Oracle DataAccess. Then in my code I would add using statements to Oracle.DataAccess.Client and Types. After that I should be good to go to use the driver.

There is some bolierplate for the data access code I write. Create an OracleConnection. Set the ConnectionString. Call the CreateCommand. Set CommandText to my SQL. Then ExecuteReader and Read. Simple right?