Digging Deeper into DEPTREE_TEMPTAB

I had previously written about running a script to set up the DEPTREE_TEMPTAB table in my schema. I knew it was used to track dependencies. But I knew little else. Today I started calling the DEPTREE_FILL procedure to populate the table with objects I specified as parameters. However I only found the results for the latest procedure call in the table. Turns out each call truncates the results from prior calls.

The normal views like USER_DEPENDENCIES will show dependencies for multiple objects. However it only goes from the objects of interest to the direct objects they depend on. If those dependent objects themselves depend on other objects, that information is not captured in USER_DEPENDENCIES.

DEPTREE_TEMPTAB will dig deep and find direct and indirect dependencies for the object that you pass to DEPTREE_FILL. It produces records with the OBJECT_ID values for objects in the dependency chain, plus how deep they are in their level of indirection.

There is also an IDEPTREE view created when you run the utldtree SQL script to set all this up. This view shows the data from DEPTREE_TEMPTAB in a format that is more intuitive to the eye. Just like DEPTREE_TEMPTAB, the IDEPTREE view is not available with the default Oracle installation. You need to run a script to create it in your schema.

DEPTREE_TEMPTAB


I was taking a practice exam for an Oracle certification. There was a question about where I would look to track dependencies. The USER_DEPENDENCIES view was a given. But there seemed to be some other location. The answer given was table DEPTREE_TEMPTAB. I have never heard of it.

So I logged into Oracle and tried to describe DEPTREE_TEMPTAB. No such table. The first hit from Google search was from some Oracle 7 documentation. Maybe this table is long since deprecated? Nope. It does not come by default.

I located "utldtree.sql" script in my $ORACLE_HOME/rdbms/admin folder. It seems like I can run it from my own schema. There were some errors when I ran the script. But it was just some DROP commands that assumed the script was run before. Now I was off to the races:

SQL> desc deptree_temptab
 Name                                      Null?    Type
 ----------------------------------------- -------- ------

 OBJECT_ID                                          NUMBER
 REFERENCED_OBJECT_ID                               NUMBER
 NEST_LEVEL                                         NUMBER
 SEQ#                                               NUMBER

The table starts out empty. The comments in the script tell you to run a procedure to populate the table:

Rem        execute deptree_fill('procedure', 'scott', 'billing');
Rem        select * from deptree order by seq#;
Rem
Rem        execute deptree_fill('table', 'scott', 'emp');
Rem        select * from deptree order by seq#;
Rem
Rem        execute deptree_fill('package body', 'scott', 'accts_payable');
Rem        select * from deptree order by seq#;

Now we are cooking with gas:

SQL> select * from deptree_temptab;

 OBJECT_ID REFERENCED_OBJECT_ID NEST_LEVEL       SEQ#
---------- -------------------- ---------- ----------
     89390                    0          0          0 

I will need to delve into loading the table up with a lot of objects that depend on others. Maybe that is a topic for a future post.

Newbie Gets Confused

A relatively younger developer got tasked with doing some performance tests on a lot of new code. First task was to get a lot of data ready for the new code to operate on. Two weeks went by. He was still working on the script. During that time, I had a similar task. Luckily I have some experience in this department. So I generated the data, ran some small tests, then executed the big test.

On the third week, the developer came to me for some help. He said his code was slow. So he decided to dynamically drop all the triggers on the table he was working with. He wanted to issue an ALTER TABLE command from his script, which consisted of one large PL/SQL block. Of course it was not working. I told him you can't just drop a DDL statement in the middle of that block. You can only do that for DML. However you can run the DDL inside an EXECUTE IMMEDIATE. He seemed happy.

Later the same guy came back and said he was still having problems. The error indicated that the table he was trying to access did not exist. Well I told him that the table is most likely there. But he was running the script as a different user. I thought there was a public synonym which should have let his script resolve the table name. The guy was in a hurry. So I told him he could qualify the table name with the schema name prepended. That got him a bit further.

The problems this guy was running into stemmed from some lack of knowledge. And experience is the best teacher here. He had a lot more pain. For some reason he chose the wrong type for one parameter he was passing. Then when he used the variable of wrong type, Oracle needed to convert the value, causing it to not use the index that was needed to make everything fast. Ah this should not be this hard.

New World for the Data Engineer

I read an article on how to become a data engineer. It is interesting to see all the tools you would need to know. Obviously you need to know SQL. But you should also know Python? Since everything is in the cloud these days, you should know a vendor or two (AWS or Azure). Of course there is the whole NoSQL movement along with obligatory Big Data.

It seems there are a lot of Apache products ripe for the Big Data universe. There is Apache Kafka for messaging. Hadoop lets you do jobs where stuff is stored on disk (HDFS). Spark let's you run the jobs and store results in memory. There is also a Druid I have not heard about previously that is for real time.

In addition to the article I read, there is a Hacker News follow up discussion. There the cutting edge crew says Hadoop and Spark are already legacy. Kafka, RedShift, and a host of other technologies replace them. Man, even in the database world, things move fast.

Work Smarter not Harder

We have large data sets in my current project. Every year tons more data is loaded into the system. So we only keep the majority of data for 4 years. After that, we get rid of it. There are some specific items in the data that the customer wants to keep around forever. So we identify those important records on a yearly bases, and "archive" them.


Archive in this sense means to copy the important records to another table. That other table mimics the structure of the original table. We do run into extra work when structural changes are made to the original table. We need to ensure that those changes are made to the archival table as well. This undertaking is done once a year when the archive process is run.


Right now the archive time is coming due soon. So I am working on this task. I need to identify all the changes made to the original table. Actually there are a total of 37 original tables. Who knows what kind of changes were made in the last year. How can I be sure I account for all of them? Well previously this was a manual process, reviewing the changes made during the year. Now I am turning to the data dictionary.


It is easy to find newly added columns. I just search for columns in the original table that are missing from the archive table. Little more involved when data type changed in the original table. I must find where the type change to an entirely different type, or where something like the scale or precision changed. The data dictionary knows all. Just takes a SQL query to unearth the answers.

Know What You Are Doing

An analyst scheduled a meeting to talk about a problem discovered in production this year. They changed the format of some of the root level flags stored in the database. However, the fields that depend on those flags was still getting set using the old logic. Those fields in turn drive the setting of other fields that the customer uses in their reports.

The game plan was to update the function that sets the root level flags. We also will need to correct al the data previously processed incorrectly. Easy enough. A new guy was assigned to work on the root level flags issue. And I was tasked with updating the customer facing data.

The only tricky part of my fix was that it depended on the other fix being executed first. I asked the new guy to let me know when his fix was done so I could add a dependency to it. The new guy did contact me. He asked if I knew how to do a data update. I gave him an example script. It is a template I use when we update data in tables using some other tables as source.

The new guy too ka shot at his task. He asked me to review his work. Initially I saw some SQL errors in the code. He obviously did not run this code yet. I let him know about those problems. Then I noticed something weird. He was updating data in a table based on other fields in the same table. That did not seem like how we normally do things.

Upon closer inspection, I found that the dude did not understand which tables all the columns were in. I asked him to take a step back and look at the problem from a big picture perspective. I pointed out that normally data flows from table to table, not from a table to itself. You should understand what it is you are doing before trying to construct a technical solution to a problem.

Mysterious Double Instance Hampering Performance

I study the existing code base. Confer with a colleague. Then I determine the optimal plan to change the functionality to load only a slice of all the data. Feeling good about this. Write some helper functions. Make the minimal amount of changes to existing functions. Then I am ready to test.


I run the main stored procedure. Then my VPN drops. Should not be a problem. Just run my stored procedure again. It takes forever. I start to doubt my changes. Maybe I am now keying off a column that does not have an index.


Eventually it is time to leave work. Query is still running. I leave it run to see if it will ever complete. Just as I am sitting down to eat dinner, I get a call. My queries are slowing down the whole database. WTF? Now I must point out that this is Greenplum/PostgreSQL and not Oracle.


Yeah. I can kill one query. But every body says I have two queries running in parallel bringing everything to a halt. Then it hits me. That run when my VPN dropped. My client lost the connection. However the query must have kept running. I left it run now since it will not be in contention with the second query.


Let's hope that solves the performance problems. If not, I am sure a DBA can kill my jobs for me. At least I have a solid lead on my own alleged performance problems.