Performance Tricks in Other Databases

I have been working exclusively with the Oracle database for over 15 years. Now I have jumped to a project that uses the Greenplum database. This is a distributed database with PostgresSQL underneath. We are using it to deal with massive amount of data that have a short timeframe for processing.

My new team has informed me that the first rule of thumb is to not do updates. We insert records only. We also do not join too many tables together at once either. Mostly just two tables are joined. Here is a trick they use. The got functions which use temporary tables. They join two tables and stick the result in one temp table. Then they join that temp table with other tables to create a second temp table.

This goes on and on with different functions. They build up a huge table with many columns (1000+) by continually joined two tables together and storing the result in temp tables. Apparently this is very fast in the Greenplum environment. It does not seem intuitive. But hey. If it works, why worry about it?

Roles and Privileges

I have been studying Oracle privileges and roles today. The basics are easy. The details are tedious. There are two types of privileges: system and object. You grant these privileges to either a user or a role. If they are granted to a role, then a user gains access to the privileges when they are granted that role. There is a WITH ADMIN OPTION that allows users that you grant privileges to the ability to themselves grant the privilege to others.

When you drop an object, any privileges on the object that have been granted to roles disappear automatically. You can grant multiple privileges at once if you separate the privileges by commas. You remove privileges by REVOKE FROM. You can remove all privileges with REMOVE ALL PRIVILEGES or simply REVOKE ALL.

PUBLIC is an Oracle account that represents all users. You can grant privileges to PUBLIC. Then all users get that access. If you own an object, you do not need explicit privileges to access it. You get them by default. All grants operate immediately without the need for a COMMIT.

There are a slew of data dictionary views to look at privileges. USER_SYS_PRIVS contains those system privileges related to the user. DBA_SYS_PRIVS has all system privileges.

USER_TAB_PRIVS contains those privileges where you own the object, granted the privilege, or are a grantee of the privilege. ALL_TAB_PRIVS adds privileges through a role or that are granted to PUBLIC and available to you. DBA_TAB_PRIVS has everything.

ALL_TAB_PRIVS_RECD are privileges you have directly, through a role, or that are PUBLIC. There is a SESSION_PRIVS view that I am not too familiar with yet.

There are some old Oracle defined roles that are being phased out. But I think you still need to know them. CONNECT gives you CREATE SESSION. It is for a general user. RESOURCE is a role for application developers. DBA is a role for administrators.

Roles can be granted to roles. There are a number of data dictionary views associated with roles. ROLE_ROLE_PRIVS are the roles that have been granted other roles. ROLE_SYS_PRIVS are system privileges assigned to roles. ROLE_TAV_PRIVS are object privileges granted to roles.

Here is a trivia fact: there are no privileges specifically for indexes. You get the ability to create an index when you have the CREATE TABLE privilege.

Materialized Views

I have been tracing down a customer problem. Sometimes the summary screen shows correct data. Sometimes it is all zeros. Looks like the data comes from one of two materialized views. The views themselves have the right data. What could be wrong? That's the story for another post. Today I want to talk about materialized view technology.

Materialized views (MVs) were previously called snapshots. They are like regular views in that there is some SQL run to get the results. However unlike normal views, MVs store the results in a table. They were introduced in Oracle 8. The query used to populate the MV can go against tables, views, or even other MVs. Since the hard work is done when the MV is created, use of them can really speed up your queries.

Other databases have MVs. In Microsoft SQL Server they are called indexed views. And in DB2 they are called materialized query tables. You get the speed increase when you directly use the MVs. There is also a query rewrite technology where the database can use an MV instead of a base table to get speedup. This query rewrite was introduced in Oracle 8i.

There are different strategies to refresh the data in an MV. You can manually do it. Or you could periodically update. Or you could automatically update them when the underlying tables change. That last change gets captured in materialized view logs. There is a fast refresh option that requires specifics around the underlying query:
  • No aggregate functions
  • No joins other than subqueries
  • No mismatch columns in a UNION
  • Plus some other subquery restrictions
Not only is an MV good for improving query performance, it can help replicate data across databases. You just set up an MV in your database with the underlying tables being remote. Of course any user defined types in the source databases need to be in your own database.

You cannot perform any DDL on an MV. But you can perform DML. It depends on how your MV was set up. You can make it read only. Or you could make it updatable. Do that by specifying FOR UPDATE AS prior to the SELECT. There are even writable MVs. But they are rarely used.

The query under the MV can have an ORDER BY clause. However that only applies to the data from the initial creation. Updates can make the MV out of order. The MV can be built deferred. Then the data can later be added during a refresh. Finally if you have a regular table that acts like an MV, you can turn it into an MV  by using the ON PREBUILT TABLE during MV creation.

Merge and Flashback

I will be honest. I have not used the MERGE statement before. Looking promising though. You can do a lot with just one MERGE. It accomplishes two or more DML operations in one SQL statement. It can do an INSERT, UPDATE, or DELETE.

With flashback, you can look at data at a previous point in time. You can go back to the last DDL that occurred on a table. You are also constrained by the undo retention period specified by the Oracle database parameters. There are a couple types of flashback you can employ:
  1. FQ - Flashback Query
  2. FVQ - Flashback Query Version
  3. FTQ - Flashback Transaction Query
FQ lets you look at data at a specific point in time. An important pseudo column that works with FQ is the System Change Number (SCN). Each commit gets a unique one. You can convert it to time using SCN_TO_TIMESTAMP(). FQ can query prior values of a table with respect to a certain TIMESTAMP. Or you can go back to a specific SCN.

FVQ gives you rows that represent committed versions of data. Note that if you insert/update and then delete rows in a transaction, those operations do not show up in the FVQ. You only see material changes per transaction. You SELECT data whose VERSIONS are between TIMESTAMP values. You can also look between SCN ranges too. FVW does not work on views.

FTQ is running queries against the FLASHBACK_TRANSACTION_QUERY view. Working with this technology involves the global transaction identifier (XID). It is a RAW value tied to the transaction the change was made in. You can look at the value using the RAWTOHEX() function. Basically you want to query the view using some value(s) of the XID.

Subqueries and Such

Today I was studying for my Oracle SQL Expert certification exam. I started out knowing the material I was studying. Then I hit a number of new topics. Some of them I had heard of but did not understand. Others I was in the dark about. Still need to practice up. But I am learning a whole lot.

First up let's talk about the CTAS - Create Table As Select. You create a table and insert data into it all in one SQL statement. Constraints and indexes do not get copied over (except for the NOT NULL constraint). If you SELECT something other than a column, then you need to provide a column alias to define the column name in the new table. Or you could provide the fully qualified list of columns in the CREATE TABLE part.

The CTAS technique can be used to INSERT rows into a table using a subquery. Just SELECT INTO your table. Not need for a VALUES clause. Stick the SELECT subquery right after that.

Multitable insert allows you to, as the name suggests, insert into more than one table with a single SQL statement. This can be an unconditional or conditional insert. For unconditional inserts, you must specify the ALL keyword. Otherwise you can specify the ALL or FIRST keyword for a conditional insert. ALL means all of the WHEN clauses are evaluated. FIRST means only the first WHEN clause to match gets chosen. Multitable inserts do not work on views. Be warned that you cannot put sequences on the subquery for the mutlitable insert. Also watch out because the insert is a single SQL statement, which generates only one sequence value.

The Data Dictionary

Oracle maintains the data dictionary. This is a bunch of tables and views owned by SYS. They contain details on the result of DDL performed. There are over 2000 views provided. You can find information on these views in the DICTIONARY view.

The tables in the data dictionary are named in a certain pattern. The normally start with the prefixes USER, ALL, or DBA. For example, USER_TABLES is all tables owned by the current user. ALL_TABLES is all tables the current user has access to. And DBA_TABLE is all the table in the database.

There are V_$ views which have a public synonym which starts with V$. An example is V$SESSION. These views are dynamic performance views. There are also GV_$ views, with public synonyms that start with GV$. These are global dynamic performance views. If you want to join some of these views together, you should first copy their data to temp tables. Then you can query the temp tables. This is because of the dynamic nature of these views based on what is going on in the system.

USER_SYNOYNMS contains private synonyms for the current user. PUBLIC synonyms are in ALL_SYNONYMS and DBA_SYNONYMS. USER_CATALOG has a bit of information on  tables, views, synonyms and sequences owned by the current user. USER_OBJECTS has more information on objects owned by the current user.

Some of the most important data dictionary views are USER_TABLES and USER_TAB_COLUMNS. These have corresponding synonyms TABS and COLS. You can find out about your tables and columns in those tables with these views.

USER_SYS_PRIVS has the system privileges granted to the current yser. USER_ROLE_PRIVS has the roles granted to the current user. And USER_TAB_PRIVS has privileges granted on objects that the current user is involved with. "Involved" means is the owner, is granted privilege to, or has granted privilege for.

Finally the USER_CONSTRAINTS view has info on constraints owned by the current user. They are divided up by constraint type:
  • P for primary key
  • R for foreign key
  • U for unique
  • C for check and NOT NULL


I have seen some jobs that ask for experience with functions like ROLLUP and CUBE. Previously I had never heard of them. Seems like some sort of geometry. Ha. Well now I know better.

ROLLUP is a subclass of GROUP BY. It produces subaggregate rows, or aggregates of aggregates. It works well with the SUM aggregate function. But you can use it with any aggregate function. For each group set, you get a single line summary. If the output of a normal GROUP BY is called regular rows, then the extra rows from the ROLLUP are called superaggregate rows.

Next up we have the CUBE. This is like a 3D ROLLUP. As such it is a subclass of GROUP BY. You get all the normal output from ROLLUP. You also get subtotals for the groupings.

Something that helps with formatting ROLLUP and CUBE output is GROUPING. This is a function that returns either 0 or 1. If you do not use ROLLUP or CUBE, GROUPING returns a 0. If you are on a superaggregate row of output, GROUPING returns a 1. Otherwise it returns a 0. You can use the GROUPING in your SELECT to transform the output. For example, you can format some value selected. Or you could replace the text entirely.

The final topic of the day is GROUPING SETS. They work with GROUP BY. They control what groups to display in the output. You pass lists in the GROUPING SETS. A NULL in the list indicates you want a single grand total. The result is almost like separate SQL statements GROUP BY each list, and the whole thing UNION ALL together.