Roles and Privileges

I have been studying Oracle privileges and roles today. The basics are easy. The details are tedious. There are two types of privileges: system and object. You grant these privileges to either a user or a role. If they are granted to a role, then a user gains access to the privileges when they are granted that role. There is a WITH ADMIN OPTION that allows users that you grant privileges to the ability to themselves grant the privilege to others.

When you drop an object, any privileges on the object that have been granted to roles disappear automatically. You can grant multiple privileges at once if you separate the privileges by commas. You remove privileges by REVOKE FROM. You can remove all privileges with REMOVE ALL PRIVILEGES or simply REVOKE ALL.

PUBLIC is an Oracle account that represents all users. You can grant privileges to PUBLIC. Then all users get that access. If you own an object, you do not need explicit privileges to access it. You get them by default. All grants operate immediately without the need for a COMMIT.

There are a slew of data dictionary views to look at privileges. USER_SYS_PRIVS contains those system privileges related to the user. DBA_SYS_PRIVS has all system privileges.

USER_TAB_PRIVS contains those privileges where you own the object, granted the privilege, or are a grantee of the privilege. ALL_TAB_PRIVS adds privileges through a role or that are granted to PUBLIC and available to you. DBA_TAB_PRIVS has everything.

ALL_TAB_PRIVS_RECD are privileges you have directly, through a role, or that are PUBLIC. There is a SESSION_PRIVS view that I am not too familiar with yet.

There are some old Oracle defined roles that are being phased out. But I think you still need to know them. CONNECT gives you CREATE SESSION. It is for a general user. RESOURCE is a role for application developers. DBA is a role for administrators.

Roles can be granted to roles. There are a number of data dictionary views associated with roles. ROLE_ROLE_PRIVS are the roles that have been granted other roles. ROLE_SYS_PRIVS are system privileges assigned to roles. ROLE_TAV_PRIVS are object privileges granted to roles.

Here is a trivia fact: there are no privileges specifically for indexes. You get the ability to create an index when you have the CREATE TABLE privilege.

Materialized Views

I have been tracing down a customer problem. Sometimes the summary screen shows correct data. Sometimes it is all zeros. Looks like the data comes from one of two materialized views. The views themselves have the right data. What could be wrong? That's the story for another post. Today I want to talk about materialized view technology.

Materialized views (MVs) were previously called snapshots. They are like regular views in that there is some SQL run to get the results. However unlike normal views, MVs store the results in a table. They were introduced in Oracle 8. The query used to populate the MV can go against tables, views, or even other MVs. Since the hard work is done when the MV is created, use of them can really speed up your queries.

Other databases have MVs. In Microsoft SQL Server they are called indexed views. And in DB2 they are called materialized query tables. You get the speed increase when you directly use the MVs. There is also a query rewrite technology where the database can use an MV instead of a base table to get speedup. This query rewrite was introduced in Oracle 8i.

There are different strategies to refresh the data in an MV. You can manually do it. Or you could periodically update. Or you could automatically update them when the underlying tables change. That last change gets captured in materialized view logs. There is a fast refresh option that requires specifics around the underlying query:
  • No CONNECT BY
  • No INTERSECT, MINUS, or UNION ALL
  • No aggregate functions
  • No joins other than subqueries
  • No mismatch columns in a UNION
  • Plus some other subquery restrictions
Not only is an MV good for improving query performance, it can help replicate data across databases. You just set up an MV in your database with the underlying tables being remote. Of course any user defined types in the source databases need to be in your own database.

You cannot perform any DDL on an MV. But you can perform DML. It depends on how your MV was set up. You can make it read only. Or you could make it updatable. Do that by specifying FOR UPDATE AS prior to the SELECT. There are even writable MVs. But they are rarely used.

The query under the MV can have an ORDER BY clause. However that only applies to the data from the initial creation. Updates can make the MV out of order. The MV can be built deferred. Then the data can later be added during a refresh. Finally if you have a regular table that acts like an MV, you can turn it into an MV  by using the ON PREBUILT TABLE during MV creation.

Merge and Flashback

I will be honest. I have not used the MERGE statement before. Looking promising though. You can do a lot with just one MERGE. It accomplishes two or more DML operations in one SQL statement. It can do an INSERT, UPDATE, or DELETE.

With flashback, you can look at data at a previous point in time. You can go back to the last DDL that occurred on a table. You are also constrained by the undo retention period specified by the Oracle database parameters. There are a couple types of flashback you can employ:
  1. FQ - Flashback Query
  2. FVQ - Flashback Query Version
  3. FTQ - Flashback Transaction Query
FQ lets you look at data at a specific point in time. An important pseudo column that works with FQ is the System Change Number (SCN). Each commit gets a unique one. You can convert it to time using SCN_TO_TIMESTAMP(). FQ can query prior values of a table with respect to a certain TIMESTAMP. Or you can go back to a specific SCN.

FVQ gives you rows that represent committed versions of data. Note that if you insert/update and then delete rows in a transaction, those operations do not show up in the FVQ. You only see material changes per transaction. You SELECT data whose VERSIONS are between TIMESTAMP values. You can also look between SCN ranges too. FVW does not work on views.

FTQ is running queries against the FLASHBACK_TRANSACTION_QUERY view. Working with this technology involves the global transaction identifier (XID). It is a RAW value tied to the transaction the change was made in. You can look at the value using the RAWTOHEX() function. Basically you want to query the view using some value(s) of the XID.

Subqueries and Such

Today I was studying for my Oracle SQL Expert certification exam. I started out knowing the material I was studying. Then I hit a number of new topics. Some of them I had heard of but did not understand. Others I was in the dark about. Still need to practice up. But I am learning a whole lot.

First up let's talk about the CTAS - Create Table As Select. You create a table and insert data into it all in one SQL statement. Constraints and indexes do not get copied over (except for the NOT NULL constraint). If you SELECT something other than a column, then you need to provide a column alias to define the column name in the new table. Or you could provide the fully qualified list of columns in the CREATE TABLE part.

The CTAS technique can be used to INSERT rows into a table using a subquery. Just SELECT INTO your table. Not need for a VALUES clause. Stick the SELECT subquery right after that.

Multitable insert allows you to, as the name suggests, insert into more than one table with a single SQL statement. This can be an unconditional or conditional insert. For unconditional inserts, you must specify the ALL keyword. Otherwise you can specify the ALL or FIRST keyword for a conditional insert. ALL means all of the WHEN clauses are evaluated. FIRST means only the first WHEN clause to match gets chosen. Multitable inserts do not work on views. Be warned that you cannot put sequences on the subquery for the mutlitable insert. Also watch out because the insert is a single SQL statement, which generates only one sequence value.

The Data Dictionary

Oracle maintains the data dictionary. This is a bunch of tables and views owned by SYS. They contain details on the result of DDL performed. There are over 2000 views provided. You can find information on these views in the DICTIONARY view.

The tables in the data dictionary are named in a certain pattern. The normally start with the prefixes USER, ALL, or DBA. For example, USER_TABLES is all tables owned by the current user. ALL_TABLES is all tables the current user has access to. And DBA_TABLE is all the table in the database.

There are V_$ views which have a public synonym which starts with V$. An example is V$SESSION. These views are dynamic performance views. There are also GV_$ views, with public synonyms that start with GV$. These are global dynamic performance views. If you want to join some of these views together, you should first copy their data to temp tables. Then you can query the temp tables. This is because of the dynamic nature of these views based on what is going on in the system.

USER_SYNOYNMS contains private synonyms for the current user. PUBLIC synonyms are in ALL_SYNONYMS and DBA_SYNONYMS. USER_CATALOG has a bit of information on  tables, views, synonyms and sequences owned by the current user. USER_OBJECTS has more information on objects owned by the current user.

Some of the most important data dictionary views are USER_TABLES and USER_TAB_COLUMNS. These have corresponding synonyms TABS and COLS. You can find out about your tables and columns in those tables with these views.

USER_SYS_PRIVS has the system privileges granted to the current yser. USER_ROLE_PRIVS has the roles granted to the current user. And USER_TAB_PRIVS has privileges granted on objects that the current user is involved with. "Involved" means is the owner, is granted privilege to, or has granted privilege for.

Finally the USER_CONSTRAINTS view has info on constraints owned by the current user. They are divided up by constraint type:
  • P for primary key
  • R for foreign key
  • U for unique
  • C for check and NOT NULL

The New GROUP BY

I have seen some jobs that ask for experience with functions like ROLLUP and CUBE. Previously I had never heard of them. Seems like some sort of geometry. Ha. Well now I know better.

ROLLUP is a subclass of GROUP BY. It produces subaggregate rows, or aggregates of aggregates. It works well with the SUM aggregate function. But you can use it with any aggregate function. For each group set, you get a single line summary. If the output of a normal GROUP BY is called regular rows, then the extra rows from the ROLLUP are called superaggregate rows.

Next up we have the CUBE. This is like a 3D ROLLUP. As such it is a subclass of GROUP BY. You get all the normal output from ROLLUP. You also get subtotals for the groupings.

Something that helps with formatting ROLLUP and CUBE output is GROUPING. This is a function that returns either 0 or 1. If you do not use ROLLUP or CUBE, GROUPING returns a 0. If you are on a superaggregate row of output, GROUPING returns a 1. Otherwise it returns a 0. You can use the GROUPING in your SELECT to transform the output. For example, you can format some value selected. Or you could replace the text entirely.

The final topic of the day is GROUPING SETS. They work with GROUP BY. They control what groups to display in the output. You pass lists in the GROUPING SETS. A NULL in the list indicates you want a single grand total. The result is almost like separate SQL statements GROUP BY each list, and the whole thing UNION ALL together.

Set Operators

You can combine two SELECT statements using set operations. Here are the valid choices for the operation:
  1. UNION
  2. UNION ALL
  3. INTERSECT
  4. MINUS
UNION and UNION ALL combine the results from both queries. UNION will remove the duplicates. UNION ALL will not. INTERSECT gives you what is common to both results. And MINUS gives you stuff from the second results that were not in the first results.

I have used UNION and UNION ALL before. Normally I choose UNION unless there is some business restriction that wants me to filter out the duplicates. Here is a trivia fact: INTERSECT will also filter out any duplicates.

You cannot perform set operations on columns of type BLOCK or CLOB. You can only do an ORDER BY at the end of the whole set operation. The ORDER BY can be by position, or reference an alias from the first query.

Database Objects

I need to brush up on my skills to modify columns in tables, create indexes, and create constraints. Just don't get a lot of practice in these tasks. I thought I would write down some notes on these and other Oracle object manipulation commands.

This might be obvious. But you cannot add a NOT NULL constraint to a column on a table with records in it (which don't meet the constraint). When you create a foreign key column, you can set the constraint as DEFERRABLE. It won't be enforced until a COMMIT has been issues, or until you explicitly set the constaint as IMMEDIATE.

You can create an index on a column right when you are creating the table. The command can reference an existing index name. Or more commonly, it can just stick the index creation SQL right in there with the column definition in parentheses.

Function based indexes allows you to store indexes on the values of a function. That way when SQL uses the function, the optimizer will use the index to speed things up.

FLASHBACK lets you recover objects and data. For example you can do this on a TABLE. You could also do this on DML that has transpired. When you do it on a table, it recovers the table from the DROP TABLE command. You can't restore the state prior to when the last ALTER TABLE was issued. You can restore it to a newly named table. Non-bitmap indexes are recovered. Non foreign key constraints are recovered. And privileges are recovered.

The FLASHBACK works because when you DROP TABLE, it gets saved away. See details by querying the USER_RECYCLEBIN view. However if you issue a PURGE, the dropped table will be removed from the recycle bin and you cannot FLASHBACK recover it.

The FLASHBACK can bring a table back to a point in time defined by:
  1. Prior to the DROP
  2. A TIMESTAMP
  3. An SCN
  4. A restore point
External tables are a special kind of table. They are read only. The metadata about the table is stored in the database. The data is stored outside the database. You can SELECT from the external table. But you cannot issue DML against it. There are no indexes, constraints, LOB columns or unused columns allowed on an external table.

To use an external table, you must first CREATE DIRECTORY where the file will be located. Then you grant privileges on the directory. Finally you create the table with a special EXTERNAL syntax. You have to specify how the table will be loaded (i.e. SQL*Loader or DATAPUMP). We actually use external tables on our project when we have a file we need to access from database code.

Views, Sequences, and Such

I will be honest. I don't use views all the much. But it is good to know a little about about them. They are SELECT statements that have a name. You access a view just like you would a table. For example, you can DESCRIBE it. There are some scenarios where you cannot do all insert/update/delete operations on a view:
  • If a required field in a table is not contained in the view
  • SQL to create view has GROUP BY
  • SQL to create view has aggregation
  • SQL to create view uses DISTINCT
  • SQL to create view has more than one table (through a join)
I think I have mentioned before that an inline view is when the table section of a FROM clause is replaced by a SELECT statement. Now let's move on to sequences. They are mainly to create values for a primary key. They have properties such as INCREMENT BY and START WITH. You use the pseudo columns NEXTVAL and CURRVAL to get sequence values. Note that you cannot access CURRVAL in a session without first getting the NEXTVAL.

Indexes are objects to make queries run faster. The WHERE clause and ORDER BY clause will specifically benefit from some indexes. They cannot be placed on columns of type LOB or RAW. You automatically get an index for a primary key or a unique column. The optimizer might use an index for almost everything except for NOT EQUALS comparisons. A composite index has two or more columns in it.

Finally synonyms are aliases for database objects. They can be for tables, views, sequences, or even other synonyms. The object referenced does not even have to exist when you create the synonym. But it must by the itme you use it. There are two types of synoyms:
  1. PUBLIC
  2. PRIVATE
Private synonyms are those created without the PUBLIC keyword. They are owned by the user that creates them. They get precedence over a PUBLIC synonym with the same name. Public synonyms are owned by a used called PUBLIC. The are visible to all users. You cannot ALTER a synonym. But you can CREATE OR REPLACE one.

Subqueries

I use subqueries once in a while. That's why I am not very strong with them. They are going to be on my OCA SQL Expert exam. Time to bone up on my subquery knowledge. There are different classifications (a subquery may fall in more than 1 category):
  • Single row
  • Multiple row
  • Multiple column
  • Correlated
  • Scalar
Most subqueries are categorized by how many rows and columns the subquery returns. The exception is the correlated subquery. It references (correlates with) the outer query. A correlated subquery cannot stand on its own. Here is where you can use a correlated subquery:
  • SELECT
  • UPDATE
  • DELETE
By the way, an inline view is when a subquery takes the place of a table in the FROM clause of a SELECT query. Now let's get back to the categories. A multiple row subquery needs special comparisons in the outer query:
  • IN
  • NOT IN
  • ANY (also known by SOME)
  • ALL
Multiple column subqueries take the form in the WHERE clause as follows - WHERE (col1, col2) = (SELECT col1, col2 FROM ... etc). This one does not seem as intuitive. Then there is the scalar subquery that returns a single row and column. There are a number of places you cannot use a scalar subquery:
  • CHECK constraint
  • GROUP BY
  • HAVING
  • Function based index
  • DEFAULT value
  • RETURNING
  • WHEN (from the CASE)
  • START WITH / CONNECT BY
EXISTS checks whether there are any rows in the subquery. If so, it returns TRUE. If not, it returns FALSE. You can also use the opposite which is NOT EXISTS. If you use a SELECT that employs the EXISTS, it is called a semijoin.

WITH allows you to name a subquery block. The named queries act like inline views or temporary tables. The name is valid in all of the statement except in the subquery itself.

All About Joins

There are multiple ways to classify SQL joins:
  • Inner joins match values in all tables
  • Outer joins match even if one table does not have a match
  • Equijoins do an exact match on values
  • Non-equijoins match with comparisons such as "greater than"
There are three types of outer joins:
  1. Left
  2. Right
  3. Full
A natural join will join multiple tables that have the same column names. You cannot provide aliases for the joined columns. This produces an inner join.

Another way to join tables with same column names is to use USING(same_col_name). You cannot prefix the column name with an alias or even a table name. This is similar to the natural join, except that this allows outer joins as well as inner joins.

The self join will join a table with itself. It can be an inner join or an outer join. It can be an equijoin or a non-equijoin.

Finally there is the Cartesian product. It is also called the cross join. This is not very useful. You invoke it with the CROSS JOIN syntax, or SELECT FROM multiple tables without specifying any joins. Be warned that you might get a whole lot of rows with this option.

Scalar Functions and Timezones

I continue studying up for my OCA SQL Expert exam. Today I want to talk about scalar functions. These are also called single row functions. But before we dive into them, I want to revisit DUAL. This is a table specific to Oracle. It only has one column DUMMY of type VARCHAR2(1). That column has the value 'X'. You can use dual when you want to issue a select but do not want to go against any other table.

Now back to scalar functions. An example math scalar function is MOD. It stands for modulus. It is the remainder left over when you divide integers. A similar function is REMAINDER. However REMAINDER will find the closet divisor, possibly going past when evenly divides, and might return a negative number.

How about we talk about timezones? Here are three types:
  1. UTC - Universal timezone, previously GMT
  2. Database - the timezone on the server returned by DBTIMEZONE()
  3. Session - the timezone of the user returned by SESSIONTIMEZONE()
Recall that a TIMESTAMP is a date plus fractional seconds. Well you can also have TIMESTAMP WITH TIMEZONE. This has a TIMESTAMP plus details on the timezone. The timezone info is stored in a UTC offset or a region name. There is also TIMESTAMP WIT?H LOCAL TIMEZONE. This is the time normalized to the database timezone.

The CURRENT_DATE() and CURRENT_TIMESTAMP() functions returns values in session timezone. The FROM_TZ() produces a TIMESTAMP WITH TIMEZONE. So does TO_TIMESTAMP_TZ(). You can use CAST() to change the type. You can EXTRACT() details of a timezone. And you can use AT TIME ZONE to transform a time to a different timezone.

Certification Review of SELECT

Okay. I am reading up every night to study for the SQL Expert certification exam. When you put DISTINCT in your SELECT statement, it operates on sets of columns that you use. It produces the same result as UNIQUE. Here are the different operations that are in play with your SQL statement:
  1. Projection
  2. Selection
  3. Joining
Projection is choosing a subset of columns. Selection is choosing a subset of rows. And joining is combining two or more tables.

A view acts like a table. It looks like a table. Behind the scenes, it runs a SELECT statement to produce its results.

Synonyms are aliases for existing objects. Private synonyms are owned by a schema. And public synonyms are owned by a user named PUBLIC.

The WHERE clause has access to all columns in a table, regardless of whether they exists in the SELECT list. The order of precedence, highest to lowest, of logical operators is NOT, AND, then OR.

If BETWEEN is in a WHERE clause, it is inclusive of the two end points given.

The ORDER BY also has access to all columns in the table. It is always the final part in a SQL statement. It can use alias names for sorting. You can also do positional ordering, the first column being number 1. NULL values always have the highest values.

I really have only one interesting fact about LIKE. The % is only expanded after the LIKE keyword. So don't use it on the left side!

More Certification Review

I took a peek at the different kinds of certification offered by Oracle. There is an advanced PL/SQL certification that might be in my future. However I might also want to get some database administration in there. And I am curious about APEX certification.

The beginner database administration certification seems easy. You just pass a test. But to get to the next level, you need to jump through some hoops. One such hoop is that you need to take a class from Oracle. All formats of training cost the same amount. So if I go this route, I will probably attend a class in person. The problem is that cool classes like hard core performance tuning last five days. And that means they cost $3500. I guess I could get my company to pay. However the price seems a bit high.

Anyway, I am reading up my study guide to pass the SQL Expert certification exam. I learned some details about certain types that cannot be UNIQUE, PRIMARY KEY, or FOREIGN KEY. They are:
  • TIMESTAMP WITH TIME ZONE
  • BLOB
  • CLOB
Those don't make sense to be a primary key. Not sure why they could not be unique, unless their size makes checking for uniqueness prohibitive. Oh well. That's the rule.

Next up I learned that there are six types of SQL statements:
  1. DDL
  2. DML
  3. TCL
  4. Session control
  5. System Control
  6. Embedded
That last one does not feel like a separate type. You can embed DDL or DML in another language. But I digress. I know DDL - things like CREATE TABLE. However I was not as familiar with FLASHBACK and PURGE. That's why I need to study up. FLASHBACK apparently reverts the table or DB to a prior version. And PURGE gets rid of items from the recycle bin. What? Is this Microsoft Windows?

DML is the old INSERT/UPDATE/DELETE. It also encompasses MERGE. I don't use MERGE. It is some kind of combination of the other types of DML, all in one statement. I guess I will be studying that too. TCL is things like COMMIT. I know about ALTER SESSION and ALTER SYSTEM. And I got plenty of experience with embeded SQL by using Pro*C.

Okay. I am up to Chapter 4 in my study guide. 14 more chapters to go and I shall be ready to take the certification test by the end of the month. I will let you know how I fare.

Certified SQL Expert

I saw a perfect position for me at my company. It seemed to be just the kind of PL/SQL developer position I have been searching for. There were a few problems though. One was that the location was on the other side of town. And the other was that they wanted a lot of different certifications.

I am an Oracle PL/SQL Developer Certified Associate. The main reason I got certified was that my company said they wanted people to get certified, and they were willing to pay. So I think I bought some training materials. Went in for an in person class. Then took and passed the test.

The certifications desired for my dream job were ones I had not heard of before. Maybe they are new. I really don't pay too much attention to certifications any more. I think it is time for that to change. So I have been studying the OCA Oracle Database SQL Certified Expert Exam Guide. Got an online copy. I am encouraged that I get most of the answers right at the end of each chapter.

There were a lot of basics that I did not have a strong grasp on. For instance I knew object names were capped at 30 characters. I found that limit the hard way. However I did not know objects need to start with a letter. Also did not know the only special characters allowed in the name are $, _, and #.

Speaking of names, there are objects that are grouped together in namespaces. That is, certain objects cannot have the same name in the same namespace. Here are those namespaces:
  • user, role, public synonym
  • table, view, sequence, private synonym
  • index
  • constraint
We have naming conventions that make sure these things don't cross paths even if they are in different namespaces. But it is good to know the exact rules, especially if you are going to be taking a certification test.

Job Applicant Statistics

I applied to a job a little over a week ago. This was through a site that I normally do not visit. The site tried real hard to get me to pay for a subscription. No thanks. There are plenty of free job boards out there. If they don't work, I still do not think I will pay to find a job.

One strange thing about this site was that it provided statistics on my competition. It did not name any names. But it informed me about the details of the other people who applied for the job I was interested in. Weird.

Here were the different current salaries of the competition:
  • $60k
  • $70k
  • $80k
  • $100k
  • $140k
  • $150k
They were all over the place. Should I be concerned that there were some low ball salaries in there? Nah. Those applicants are probably underpaid. Most applicants had a bachelors degree. But some had a masters.Here are their current titles:
  • Analyst
  • DBA
  • ETL Developer
  • Oracle Developer
  • Project Lead
  • Senior Architect
  • Senior Developer
Those were some nice titles. Finally here are some of the areas of expertise the competition possessed:
  • Banking
  • Consulting
  • Databases
  • Front end development
  • Project Management
Well the competition is fierce as usual. I had better make sure I have enough goodies to set myself apart. I am Oracle certified. Got a lot of years experience. Also have a lot of schooling. I think I know my stuff.

What else can I do to improve my rank? Get some more certifications. I am working on that. Expand my areas of expertise. There are a lot of hot technologies in the Oracle Database universe. The real winner might be reaching out to my network. I have worked so long that I know tons of people out there with an Oracle database background.

Group Functions

A group function will return one row for possibly many matching rows in the table(s) you select from. These are also called multirow functions or aggregate functions. One common example is COUNT(). It returns a single row. Even if there are no rows to select from, COUNT() returns zero.

Then there is the RANK() group function. It will order specified column(s) from first to last. Ties get the same rank, but subsequent numbers are skipped based on how many rows tie.

GROUP BY will put sets together to be treated as a whole. Group functions then operate on these sets. You can combine the ORDER BY with GROUP BY. ORDER BY can contain any expression in the GROUP BY clause, and expression in the SELECT clause, group functions, and USER/SYSDATE/UID.

The HAVING clause will restrict sets of rows from a GROUP BY. It requires GROUP BY. You can use any expression from the GROUP BY, or any group functions in the HAVING clause.

Little Bit of Large Objects

The system I work on makes small use of large objects in the Oracle database. We send letters for faxing companies. The letters are signed with an image of the sender's signature. Those signatures are bitmap images stored as LOBs in the Oracle database. Let's see what the latest book I am reading has to say about large objects.

Prior to Oracle 8, large objects were represented by the LONG and LONG RAW types in the database. There could be at most one column of this type per table. The data was stored inline with the other columns in the table.

Now we have two general types of large objects (LOBs): internal and external. The internal variety are stored in the database in tables. Examples of this are the CLOB, NCLOB, and BLOB. I am most familiar with the BLOB. You can manage these objects with the DBMS_LOB built in package.

Then there are external LOBs. An example of this is a BFILE. The data is stored on disk. And the access to the data is read only. I don't recall ever working with external LOBs. Maybe it is time for a small practice project with them?

Collections

I am still reading "Oracle Advanced PL/SQL Professional Guide". This book really stresses the different types of collections. So let's get down to business. Collections contain elements of the same type. They are stored in memory in either the PGA or SGA. Some types can also be stored in the database. Here are the collections Oracle has to offer:
  1. Associative Arrays
  2. Nested Tables
  3. VARRAYs
Associative arrays were originally called PL/SQL tables. Then they were called index by tables. Now I guess they are associative arrays. These I used to use a lot in data correction scripts. Not so much any more. You index the array by either a number or string. It can be a sparse collection. You define a type for the associative array like TYPE my_assoc_type IS TABLE OF VARCHAR2(4) INDEX ?BY BINARY_INTEGER.

Next you have nested tables. Indexes are numeric and start with 1. These start out as dense collections. But they could get sparse if you DELETE an item. This type of collection can be stored in a database. When you do that, the data for columns of this type is stored out of line. You can initialize the collection with a constructor. And you can define a collection of this type like TYPE my_nested_type IS TABLE OF VARCHAR2(4).

The VARRAY is like the nested table. It also has numeric indexes which start at 1. You can store collections of this type in a table. Unlike nested arrays, columns of this type in a database table have the collection stored inline. The result is better performance than nested tables in database tables. This collection type is always dense. While you can DELETE from a VARRAY, you must DELETE all items in the VARRAY. You can define a collection of this type like TYPE my_varray_type IS VARRAY(10) OF VARCHAR2(4).

Here are some functions that work on collections:
  • EXISTS
  • COUNT
  • FIRST
  • LAST
  • PRIOR
  • NEXT
  • EXTEND (nested table or VARRAY only)
  • TRIM (nested table or VARRAY only)
  • DELETE

All About Cursors

I have been studying my "Oracle Advanced PL/SQL Professional Guide" to prepare for a technical interview. There was a lot of information on cursors in the book. That is good since I use cursor a lot. But I normally only use a specific style of cursor.

To start with, you can define a cursor in the declaration section. If it needs any parameters, you pass it a list in parentheses almost like you do a function. Then you OPEN the cursor, passing in any parameters in parentheses. You can then FETCH the cursor into your variables. And before doing a CLOSE on the cursor, you might be checking any of the cursor attributes:
  • %ROWCOUNT
  • %ISOPEN
  • %FOUND
  • %NOTFOUND
Normally you will check %NOTFOUND to see if you are at the end of the result set. You can always access the %ISOPEN attribute on a cursor. However the others will throw an exception unless you are accessing them between the OPEN and CLOSE of the cursor.

When using a FORALL cursor, there are the additional attributes %BULK_ROWCOUNT and %BULK_EXCEPTIONS. Most of the time when you define an explicit cursor, you have the cursor name that you prepend to the attributes. However you could do an implicit cursor by executing a SELECT statement. In that scenario, you can still check the attribute, but you use the term SQL instead of the cursor name. For example, you can check SQL%ROWCOUNT.

Then there is my good friend the ref cursor. I actually don't use these very frequently. My cursors are created and used in a procedure or function. I don't pass cursors around to other procedures. But if you had to, you could use a ref cursor. It is, as the name implies, a reference to a cursor. There are two types: strong and weak. Strong ref cursors have a fixed return type in the prototype. Weak ref cursors can be used for any SELECT statement. There is also a generic SYS_REFCURSOR which is a weak ref cursor.

The book goes on at length about collections. However that warrants its own separate post.

Back to Basics

I had a technical interview set up for today. It was for a senior PL/SQL developer position. Figured I would brush up on some Oracle knowledge. So I pulled out a copy of my eBook "Oracle Advanced PL/SQL Professional Guide". Strangely enough, there was a lot of basic information in there that was good to review.

For example, DUAL is actually a table owned by SYS. I just considered it a virtual scratch pad table area. But it is a real table. It only has one column called DUMMY of type VARCHAR2(1). Furthermore, that table has exactly one row with a value of "X".

I reviewed the syntax to create a package. There is no BEGIN keyword in declaring the specification. Just an END. The package body has an optional BEGIN-END pair. But that is for a one time initialization set of code. It executes once the first time your session accesses the package. I have actually used this feature a time or two.

Now a bit about types. I know that NUMBER is an Oracle type. INTEGER on the other hand, is an Oracle subtype of NUMBER. You can define your own subtypes. Not that I do that much.

There were a bunch of tips on getting better performance. Some were not intuitive. Try to do your operations in a single SQL statement.Avoid the use of implicit conversion. I am all behind that. Implicit conversion just feels like sloppy program to begin with. Do not define your variables as NOT NULL.

Huh? You would think limiting a variable to NOT NULL might help performance. Nope. For those variables, Oracle has to do extra work to keep checking if the thing is getting set to NULL. Use PLS_INTEGER for math operations. And of course, use BULK COLLECT and FORALL when dealing with collections.

Thinking About Triggers

I got my Oracle 12c database up and running. Time to start developing. Put together a small script to create a few tables. Also created sequences to populate the IDs in the tables. I thought I would get automated and have insert triggers on the tables setting the ID values. Sweet I thought.

Went through a couple iterations until I got the basic table definitions the way I wanted. Then I coded up a procedure to insert records into the tables. Something strange was happening though. The IDs were not getting set. I thought I manually tested the triggers from the SQL*Plus command prompt.

Turns out my trigger became invalid when I kept dropped and recreating the tables. Easy enough to fix. Just recompile the triggers. Then I did some philosophical thinking about my setup. I insert a new row of data. The trigger populates the ID. How do I then find out what that ID is if I want to insert it as a foreign key in some table?

In the old days, I would let my procedures explicitly generate the IDs. That way I had the ID before the INSERT, and I could then use the value I stored in a variable to use as foreign keys. Now that the IDs are set automatically in triggers, I got to query the database after the fact. There has to be a better way to do this. Any ideas?

Pluggable Databases

I had installed my Oracle 12c database. Then I logged into Enterprise Manager with the SYSTEM account. I tried creating a user. This is what I got back:
SQL Error
Error signaled by database server
ORA-65096: invalid common user or role name
ORA-06512: at "SYS.PRVTEMX_ADMIN", line 8565
ORA-06512: at "SYS.WRI$_REPT_SECURITY", line 288
ORA-06512: at "SYS.DBMS_REPORT", line 898
ORA-06512: at line 1
What? At first I thought something was broken with user creation. Then I paid closer attention to the ORA-65096. Okay. I need a more complicated username. Nope. That was not the problem. Finally I broke down and Googled the subject.

Apparently there is some new magic going on with Oracle 12c. The username needs to start with C##. I should have known something was up when the default username in Enterprise Manager when you create a new user is C##.

This has something to do with the Oracle 12c multitenant architecture. That is somehow connected to pluggable databases as well. Those are not terms I am familiar with yet. I did see online that there were some hacks I could apply that would let me name usernames the way I want. Time to get hacking.

Apex Redux

As I was getting ready to install Oracle 12c, I pondered whether I should try to install APEX. Did a bit more research into what APEX is. Obviously it stands for application express. It is a rapid web app development tool. You need an Oracle database running application express. It is normally installed along with the database.

APEX is data in a bunch of tables plus some PL/SQL code. Since it is web based, you specify a URL which gets translated to an APEX PL/SQL call. Results are sent back in the form of HTML. Each request uses a separate database session.

There are two models that APEX can run under: the three tier model and the two tier model. Here is the three tier model:
  1. Web browser
  2. Oracle HTTP server with mod_plsql
  3. Oracle database with Oracle APEX
The second item could be replaced by an APEX listener. Then there is the two tier model:
  1. Web browser
  2. Oracle database with embedded PLSQL gate plus APEX
In APEX, a workspace is where you develop apps. It is a virtual private database. If you are not ready to install APEX on your machine, Oracle does have a hosted environment. When you first log into APEX, you arrive at your workspace home page. Application Builder is used to put your HTML interface together.

There are some sample objects available. One is Human Resources (HR) and another is Order Entry (OE). I think I will just stick with my Oracle database 12c for now. There will be time to play with APEX in the future.

Users Galore

I had installed Oracle 12c on my laptop. It informed me that Enterprise Manager was accessible using this link: https://localhost:5500/em.

Before I could run over and log in with Enterprise Manger, I decided to click the Password Management button at the end of the database install and configuration. There were a ton of accounts that got created for free with the database.

The only two accounts that were unlocked by default were SYS and SYSTEM. I always wondered what the difference between these two were. They are both real users with schemas. SYS owns the data dictionary tables. I guess SYSTEM is one I should log onto first to create my real daily user?

One thing I do know is that these are unlike SYSDBA and SYSOPER, which are actually roles and not users. The real surprise was the tons of other users that got created by default. I left all those user accounts locked by default. Here are the multitude of accounts:
  • OUTLN
  • LBACSYS
  • OLAPSYS
  • SI_INFORMTN_SCHEMA
  • DVSYS
  • ORDPLUGINS
  • XDB
  • ANONYMOUS
  • CYXSYS
  • ORDDATA
  • GSMADMIN_INTERVAL
  • APPQOSSYS
  • APEX_040200
  • WMSYS
  • DBSNMP
  • ORDSYS
  • MDSYS
  • DVF
  • FLOWS_FILES
  • AUDSYS
  • GSMUSER
  • SPATIAL_WFS_ADMIN_USR
  • SPATIAL_CSW_ADMIN_USR
  • APEX_PUBLIC_USER
  •  SYSDG
  • DIP
  • SYSBACKUP
  • MDDATA
  • GSMCATUSER
  • SYSKM
  • ORACL_OCM
Those were a lot of users. Some of the names were chosen well so I have an idea of what they are used for. I will have to do some Internet research to find out more. For now I tried logging into the database. At first it kept complaining that my username/password were wrong. Took a while to figure out that I had the caps lock on when I typed in the passwords for SYS and SYSTEM.

In Enterprise Manager, I went over to Security|Users to create my first user account. Now we are cooking with gas. I think I am going to have some fun with this database. Already got a few applications that I need written.

Oracle Install

So I downloaded this 3G Oracle Database 12c installation. And the Universal Installer would not run. I was not going to run around trying to install the required Java Runtime Environemt. I already have that on my machine. Maybe the location could not be detected. Therefore I did what any anxious developer would do. I restarted the install.

This time around it worked. That's when I arrived at Step 1 of 10. Had to decide between a desktop and server class. Well my little laptop is definitely the desktop variety. Then I was given the chance to specify the Windows user for the Oracle home user. I just chose a built-in account. That might be a security risk in the production world. But this is just my throw away database.

There was a drop list choice for the installation type. However the only option in it was the Enterprise edition. Good enough for me. The global database name was defaulted to "orcl.home" which I accepted as is. For some reason I decided to save my response file. You never know when I might want to do an unattended install with the same set of options.

Here were the steps for the database install:
  1. Prepare
  2. Copy files
  3. Setup
  • Setup Oracle base
  • Oracle Database configuration
Yeah. Steps one, two and three. Then some other steps. Somebody has their numbering mixed up. I swear it is not me. Finally after leaving the thing run all night, my database installation was complete. My SID was orcl. That was the default name I chose. I got a URL to run Enterprise Manager Database Express. Watch out world. Here I come.

Orale 12c Baby

Yesterday I decided it was time to install Oracle on my laptop. I downloaded Oracle Database 12c Release 12.1.0.2.0 from Oracle. The 64-bit Windows edition requires two zip files to be downloaded. They were 1.5G each in size.

If you want to program with Oracle components, you will need one of the following versions of Microsoft Visual Studio:
  • Visual C++ 2010 SP1
  • Visual C++ 2012
  • Visual C++ 2013 Update 2
Note that the VC++ 2013 does not work with Pro*C yet. Here are some components that you can use if you have the right development environment from Microsoft:
  • Oracle Call Interface
  • Pro*C/C++
  • XDK
Somewhere I read that I would only get a 30 day evaluation. What? I want a free version of Oracle that I can play around with indefinitely. But for now, 30 days will work. I ran the setup.exe program. It launches a command prompt style window. That in turn launches the Oracle Universal Installer.

Oops. The Universal Installer would not run on my machine. It complained that the Java run-time was not found at "bin\jawaw.exe". I was directed to go to http://www.javasoft.com to install the JRE version 1.4 or higher. Umm I thought the installer was supposed to install Oracle's own version of the JRE.

Oracle 12c New PL/SQL Features

Now let's see what Oracle 12c has in store for PL/SQL and database changes. The RESULT_CACHE has been improved. You can now use it with functions that specify invoker's rights. The WITH clause can define a PL/SQL function that you can call directly from SQL without a PL/SQL context switch. It can also be used to get access to a constant from a package.

ACCESSIBLE BY will allow you to specify packages that can access your newly defined package. You cannot call this newly defined package directly. But the packages on the "white list" can. This allows you to define helper functions that have restricted usage.

You can now grant roles to program units. That is, grant them to packages, procedures, and functions. Then you can use invoker's rights on them. They can get the privileges they need through the roles. You can bind PL/SQL types such as BOOLEAN to SQL. FETCH FIRST let's you limit the number of rows that return from a query.

BEQUEATH_CURRENT_USER let's you specify a view to have invoker's rights. Invisible columns can be created when you specify INVISIBLE during the DDL. They can only be seen if you explicitly name them. DESCRIBE does not see them. Neither does SELECT *. However %ROWTYPE knows about them. They have no column order. Strangely enough, that can be mandatory columns.

The UTL_CALLSTACK package contains routines previously found in DBMS_UTILITY. Then there is the EXPAND_SQL_TEXT in the DMBS_UTILITY package that expands view queries to their base tables. You can ENABLE_DDL_LOGGING. This is an init param. All DDL will now be logged in an XML file. It can be activated at the database or session level.

Temporary undo can now be stored in a temporary table instead of the undo tablespace. You can now issue a TRUNCATE TABLE CASCADE to get rid of tables plus those that depend on the table you are truncating.

Oracle 11g New PL/SQL Features

All right. We are now at the point where I am learning features of the current Oracle database we run in production. There is a DBMS_UPDATE_PARALLEL that let's you group sets of rows into chunks. Then you can update the chunk for performance gains. Regular expressions get a new REGEXP_COUNT method.

You can use CURRVAL and NEXTVAL in PL/SQL without having to SELECT them from DUAL. You get all kinds of options for specifying parameters in function/procedure calls. You can still do the old positional method. But you can also do the named method. Or you could combine positional and named in what we call a mixed style.

Big news. There is a RESULT_CACHE directive you can put at the end of a function declaration. Results of the query get cached in the SGA. This can speed things up considerably. There are some limitations:
  • Does not work in an anonymous block
  • Does not work with invoker's rights
  • There can be no OUT parameters
  • The function cannot access a data dictionary
  • The function cannot access a temporary table
  • The function cannot access a sequence
  • The function cannot access a SQL function
  • No parameters can be any type of LOB
  • No parameters can be a REF CURSOR
  • No parameters can be a collection
  • No parameters can be an object
  • No parameters can be a record
  • Return type cannot be any of these prohibited parameter types either
Whew. That was a lot of restrictions. I am sure there are good reasons for them.  Now back to the new stuff. FOLLOWS can be used when defining triggers to set the order of triggers on the same table at the same firing point. PRAGMA_INLINE can inline a subprogram go speed.

New types SIMPLE_INTEGER, SIMPLE_FLOAT, and SIMPLE_DOUBLE are what you would expect. They have the added restriction that they cannot be NULL. CONTINUE can be used to move to the next iteration in a loop. Compound triggers can put triggers for more than one firing point together. This let's you share code and data between them. They all have to be DML triggers. You can only GOTO code in your own section. Other restriction that make sense are that :NEW can only be changed BEFORE EACH ROW. And :OLD and :NEW cannot be referenced in the before and after statement triggers.

There is a new syntax to specify the parent you inherited a method from. It looks like (SELF AS my_parent).my_method(). FORALL can not reference specific elements in collections. This can be done in the SET and WHERE clauses. LISAGG is a new built in function that does string aggregation. That is, it can turn values from multiple rows into one big concatenated row.

COLLECTION improvements have been made. You can CAST to a specific type of collection. You can order the elements in a COLLECTION. You can specify that the collection has only DISTINCT values in it.

Virtual columns can be added to tables. This let's you store an expression in something that behaves like a column. Syntax involves GENERATE ALWAYS AS ... VIRTUAL during column definition. It can be index. It can be part of a constraint. This makes it act like a view.

Hey. PIVOT is a new keyword to transpose rows to columns. This is good for applications such as reports. It is an aggregate operation. There is a reverse UNPIVOT that transposes columns to rows instead.

Oracle 10g New PL/SQL Features

I am on a mission to learn what new features were added to PL/SQL over the years. Now onto what was new in Oracle 10g. There are new BINARY_FLOAT and BINARY_DOUBLE types. They adhere to the IEEE definitions. FORALL let's you transfer data in sparse collections to the database. Use the keywords INDICES OF or VALUES OF.

Regular expressions can now be used in SQL queries. Here are the functions:
  • REGEXP_LIKE
  • REGEXP_INSTR
  • REGEXP_SUBSTR
  • REGEXP_REPLACE
Nested tables have some improvements. They support multiple set operations. For example, you can use EQUALS, UNION, INTERSECT, and EXCEPT on them. Collections can also be assigned to other collections. There is a user defined quote character. You choose the character. It cannot appear in your string. When you use it, single quotes can be in the middle of your string.

 A new UTL_MAIL package sends mail programmatically without having to know the low level details of the SMTP protocol. A new UTL_COMPRESS package lets you zip and unzip data. It works with RAW/BLOB/BFILE types.

There is now conditional compilation of PL/SQL. You can get the current version with DBMS_DB_VERSION. Then you can use conditional $IF expression $THEN statement syntax. There is also $ERROR handling.

DBMS_DDL allows you to hide PL/SQL source code from users. Use the CREATE_WRAPPED keyword. The PLS_INTEGER grows in size to match BINARY_INTEGER. Implicit conversion can now be done between CLOB and NCLOB. Watch out for performance implications though.

DBMS_UTILITY gets the FORMAT_ERROR_BACKTRACE method to print out the call stack during exception processing. COLLECT is a new group function that gathers GROUP records and rturns them in a collection. You can write and read external tables.

Oracle 9i New PL/SQL Features

In my day to day job, I usually employ PL/SQL features that have been around since the old days. I am talking about Oracle 8 here. It is not that I don't occasionally see features recently added. But I just don't have them in my repertoire. I think I got Oracle certified around the time that Oracle 9i came out. Specifics for 9i were not on my test. So I probably never studied them.

Now I am looking around for my next gig. I want it to be a database development job. Specifically I want to be an Oracle developer. So I had better know what is new. I use the term "new" loosely. Well let's pick up where I left off. What was new way back when Oracle 9i came out?

There is a new DBMS_METADATA package that can extract object definition. It will output XML or DDL. Your choice. Then there are the ANSI-standard CASE statements and expressions. The UTL_RAW package can convert to and from NUMBER and BINARY_INTEGER. There is a web version of SQL*Plus called iSQL*Plus.

You can insert a full row into a PL/SQL table now. You can create collections that are indexed by a VARCHAR2. They become hash tables. You can override an object's default constructor with your own code. The TREAT keyword allows you to specify which method you have inherited gets executed.

TIMESTAMP is a new type with fractional seconds. "Table Functions" produce a collection of rows. The result can be queried like you query with SQL. An example of the product is a nested table or VARRAY. Speaking of nested, you can nest collection types now. For example, you can have a PL/SQL table of PL/SQL tables.

You can do bulk fetches with native dynamic SQL (EXECUTE IMMEDIATE). There is a MERGE command which combines insert and update. This is something that is allegedly important for data warehousing.

Storing XML in the Database

Previously you could store XML in a CLOB data type. Then you could use DBMS_LOB to get at parts of the XML. This had the advantage that the original XML format was saved. However, the database did not know anything about the structure of the XML document in the CLOB.

The XMLTYPE was introduced in Oracle 9iR1. This allows the database to natively support XML. The type is treated as a user defined type with built in functions that operate on it. Therefore you can use it in SQL just like any other user defined type. The SQL can query parts of the XML document. You can use the type for column data types. You can use it in PL/SQL as types. It supports XML Schema.

There are different ways that XMLTYPE can internally store the data. However it is stored, an XML DOM tree structure can be constructed on demand when the need arises. Here are some options on how the XMLTYPE can internally store the data:
  1. As a CLOB
  2. Object Relational
  3. Binary XML
CLOB format keeps the original XML document format. Performance is decreased. Object relational format will shred the XML. White spaces get removed. Performance is increased. The binary XML storage option was added in Oracle 11g. The XML gets parsed first then stored in binary form.

If you want to get a some parts of the XML, you can use the extract() function. Pass it an XPath string. It will give you back an XMLTYPE. Or you can use the extractvalue() function. It too takes an XPath string to specify the piece of XML you want. It however returns a scalar value.

OEM and Grid Control

First there was Oracle Enterprise Manager Database Control. This is known as OEM. It controlled a database. You view it in a singe window. It is web based. Was historically good for new Oracle features. It is easy to set up.

Next there was Oracle Enterprise Manager Grid Control. This allows you to manage many databases, application servers, and other products. The Grid Control resides on a separate server and has its own database. You can monitor Oracle infrastructure. You can also monitor non-Oracle items such as:
  • Other databases
  • Web servers
  • Application servers
  • Firewalls
  • Operating systems
Grid Control reduces your costs. It reduces complexity. There are a couple components that make up the Grid Control:
  1. Oracle Management Servive
  2. Oracle Management Agent
  3. Oracle Management Repository.
The agents run on the things being managed. There is a separate server that manages all these agents. And the information is stored in a database (repository). Grid control lets you perform system monitoring. You can view server alerts. You can set up notification rules. This allows you to detect performance problems.

To simply monitoring, you can combine single components into groups. Then you can monitor the whole group as one. You can automate administrative tasks. You can publish customized reports. These reports can be HTML based. You can manage Oracle patches. The Grid Control is extensible through management plugins.

Data Masking

We test the application on our project using a development database created by scrubbing production data. That original scrub happened over 10 years ago. Our development data is starting to get a bit stale. It gets updated every year to try to keep pace with the new dates. But formats and structures change. This sounds like a job for Oracle Data Masking,

Data Masking is an add on pack for Enterprise Manager. It allows you to meet data privacy and data protection mandates. It replaces sensitive values with realistic replacements. Thus you can use production like data for non-production uses.

Data Masking can actually scan for sensitive data. It can make sure sensitive data does not leave the production environment. Its operation is also called scrambling or anonymization. The scrubbing is based on rules that you set up or choose from some preset formats.

Here are the data type supported for scrubbing:
  • Numeric
  • String
  • Date
  • LOB (requires Grid 11gR1)
These are the overall steps you take to perform the data masking:
  1. Clone production to a staging environment
  2. Select mask definition in staging
  3. Clone the masked staging data to a test environment
The predefined mask formats allow you to generate random numbers for replacements. You can also do some post processing to ensure the data is of believable format. Masking is controlled by the DM_FMTLIB package. Here are some presets for well known data types that are supported:
  • Credit card numbers
  • ISBN numbers
  • UPC codes
  • Social Security Numbers
  • Phone numbers
 Here are the roles you need to perform a masking operation:
  • Select catalog role
  • Select any dictionary
  • Execute privileges on DBMS_CRYPTO
Be aware that masking partitioned tables might cause some partition movement if you mask the partition key. There are some types that are not supported by masking:
  • Clustered tables
  • Object tables
  • XML tables
  • Virtual columns
Masking can happen quickly because of the following optimiations:
  • Bulk operations
  • Disabled logging
  • Running in parallel

Oracle Spatial

Oracle has support to location enable application. This is called Spatial. It was named Oracle Spatial since Oracle 8. And it is now called Oracle Spatial and Graph since Oracle 10g. Spatial consists of functions and procedures for quick spatial data access and analysis. The code is available in the MDSYS schema.

Spatial defines a number of geometry types:
  • Point
  • Line
  • Polygon
  • Arc
  • Circle
  • Rectangle
Spatial allows you to deal with height and depth data. Or it can represent longitude and lattitude data. This is application to a Geographic Information System (GIS) application. It allows you to project this data onto a two dimensional surface.

The basic building block in Spatial is an element. Types, like mentioned before, are point, line, or polygon. Many elements make up a geometry. And a collection of geometries is a layer.

Spatial allows you to perform an area of interest query. There are two tiers to its query model: (1) primary filter, and (2) secondary filter. To help speed querying, a concept of a spatial index has been created. This is a logical index. There are two types. The first, which is the default, is the R-tree. It uses a minimum bounding box. The second is the quadtree. It uses a technique called tessellation. Each rectangle is chopped into smaller rectangles.

Oracle Label Security

There are many ways to lock down data access in Oracle. A low granularity technique is one we use on my project. That would be the Discretionary Access Control (DAV). With this you pretty much determine whether users have SELECT/INSERT/UPDATE/DELETE privilege to any given table. The users can either do those operations, or they cannot.

The Virtual Private Database (VPD) technology takes security to another level. You get more control. Specifically you decide which rows any type of user is allowed to see. I have written about VPD before. Now I want to talk about Oracle Label Security (OLS). OLS is built upon VPD. But unlike VPD, which requires you to write an access function in PL/SQL, OLS runs "out of the box."

Out of the box means you don't need to do any programming to use OLS. You can set up what you need using Oracle Enterprise Manager. You can create labels. Assign labels to rows. And also assign labels to users. Also you can use Oracle's policy manager tool.

As you can imagine, OLS was originally created to server government applications. Specifically it was meant to address Department of Defense security needs. However OLS must have been a hit. Because now it is also being used in the commercial sector.

The high level idea is that users can have labels associated with them. Specific rows of specific tables can also have labels applied to them. When a table is accessed by a user, the labels of the user and rows are compared. If the labels match, the user gets access. This label information is stored in a new column added to the table for access purposes.

Although I have heard it different ways, more people report that OLS does not come installed by default in the Oracle database. If so, you must use the Universal Installer to add it in. You also need to configure the LBACSYS account that gets installed with OLS. You must unlock it, set its password, and grant it SELECT ANY DICTIONARY.


Recovery Manager

Oracle has had the Recovery Manager (RMAN) since Oracle 8. It is installed along with the database. RMAN is a database client. The client can be run from the command line or Enterprise Manager. One main feature is cross platform backup and restore. RMAN automates a lot of the administration of backups. It can do point in time table recovery. It also can detect corruption in backups.

RMAN backups are stored as backup sets. It can do incremental as well as full backups. RMAN works with a media manager. That would be the system that controls the devices where backups are stored. For example, one media manager might control your tape drive.

An important facet of RMAN operation is the recovery catalog. This is a database schema where RMAN activity is recorded. It tracks the file names of all database files. It stores the location of redo logs. It can be used if the control file is lost.

RMAN is started by running the rman executble. You CONNECT TARGET to log into a database. The user needs SYSDBA privileges, although you do not explicitly state AS SYSDBA. The BACKUP command will start the backup. Unique file names are generated for backup files. The files are proprietary and specific to RMAN only.

You can do a hot backup while the system is running. The database must be in ARCHIVELOG mode. This results in what they call an inconsistent backup because it will require redo logs. Backups can be full or incremental. Incremental backups start at level 0 which is equivalent to a full backup. The VALIDATE command checks whether there is any backup corruption.

You can put RMAN commands in a file. Run it with the @ sign, just like SQL*Plus. Use the LIST BACKUPS for a report. Or use the REPORT command for a more comprehensive report. DELETE will explicitly get rid of old backups. Any backups older than the retention period are automatically deleted. Recovery involves a RESTORE DATABASE along with a RECOVER DATABASE command.

Virtual Private Database

Oracle has a security feature introduced in Oracle 8i called the Virtual Private Database (VPD). It allows you to specify policies at the row and column level. Under the hood, it adds a WHERE clause to SQL statements to limit what data you can see. Normally there is no way to bypass this security.

The VPD restrictions apply to select/insert/update/index/delete statements. They do no affect DDL. The security will have an impact on security. There are some types of VPD you can use to lessen the performance hit:
  • Static policy - applied once per query
  • Context sensitive - applied when context changes
  • Dynamic policy - reevaluated every run
VPD runs with definers rights. You write a function that generates the WHERE clause to restrict access. This function must take a schema and object names as inputs. It must return a VARCHAR2 which is the WHERE clause addition. This special function cannot access the table it is providing security for.

You manage the VPD security with calls to the DBMS_RLS package. RLS stands for row level security. There is also a column masking type of VPD. That style will return all rows int he object. But sensitive columns will be returned as NULL values. The column masking technique only applies to SELECT statements.

VPD policies can be managed in groups. One should not perform a SELECT FOR UPDATE on an object managed with VPD. There can be multiple polices on an object. There WHERE clauses implementing the VPD will be joined via AND.

Earlier I said that VPD cannot be bypassed. VPD is not applied in these circumstances:
  • DIRECT path exports
  • Operations on SYS owned objects
  • Operations run by SYS
  • Operations run AS SYSDBA
  • The user has the EXEMPT ACCESS POLICY privilege
VPD goes by other names. It is also known as Fine Grained Access Control. And it is also sometimes referred to as Row Level Security. Thus the DMMS_RLS package.

Bulk Processing

I usually keep performance in mind when I write my PL/SQL. Just being smart about the SQL usually is good enough. But sometimes you got a lot of data to process. In those scenarios, you might need to bulk up. Do some bulk processing that is.

Bulk processing is the term to describe BULK COLLECT and FORALL. The big idea is that there is a PL/SQL engine and a SQL engine in the database. Switching between the two costs you speed due to overhead. So it would be best to make one handoff from the PL/SQL engine to the SQL one.

BULK COLLECT will cause a query to make a single switch into the SQL engine and fill up your colection. FORALL works the other way around, allowing you to do one DML operation for all the items in your collection with just one context switch into the SQL engine.

There is a warning though. Don't do unbounded BULK COLLECT operations. That will eat up a lot of memory in the PGA (Program Global Area). Collections are stored in memory there. You don't want a huge one taking up too much memory. Use the LIMIT statement to cap how many records are grabbed at once in a BULK COLLECT. Experience says you should use a LIMIT of at least 25. It would be preferable to use a LIMIT of around 100. Oracle uses 100 as the default optimization for cursor loops in Oracle 10g and above.

After you do the BULK COLLECT, if there are records, they will start with index 1 in your collection. The rest of the records will be densely populated in the collection. If there are no records, the collection will be empty. The COUNT on your cursor will be zero. There will not be a NO_DATA_FOUND exception raised in this scenario. Similarly you cannot count on %NOTFOUND. Use the COUNT.

Here is a tip for the FORALL. It is not a loop. You don't need a loop command. By default the collection you are doing DML on must be dense. If not, you must use the IN INDICES OF with the FORALL. That only works with Oracle 10g and above. Normally you would not want exceptions in the middle of your FORALL to stop the flow. To prevent that, use the SAVE EXCEPTIONS clause. Then errors will be stored up in the SQL%BULK_EXCEPTIONS pseudo collection.

Data Modeler

Oracle has a free Data Modeler tool. There is a stand alone version of this tool. Another version is bundled with SQL Developer v3 and beyond. Both versions have the same functionality. The stand alone version gives you more screen real estate and easier access to the menus. The tool itself was developed in Java. As such, it requires the JDK.

Data Modeler lets you create, browse, and edit models. Here are some model types it supports:
  • Logical
  • Relational
  • Physcial
  • Multi-dimensional
  • Data Type
You can do forward and reverse engineering with the tool. The targets it supports are Oracle, SQL Server, and DB2. Although I would think you would want to have the tool produce DDL, it cause also spit out other formats like CSV. You can perform collaborative development with this tool.

Why go through the trouble of using a Data Modeler? Some say it is much easier than maintaining scripts manually. It can serve as a documentation tool. Or it can just help you get a handle on an existing database system.

Entity Relationship Diagram

I drew my first Entity Relationship Diagram (ERD) over 20 years ago. Used some drawing package to make the pictures. Not a whole lot has changed since then. The ERD was invented by Peter Chen back in 1976. It looks a lot like a flowchart. The ERD repsents an information system. It shows the logical structure of a database.

The ERD deals with entities and relationships. It does not show a specific entity. Instead it deals with entity sets. Drawing up an ERD is part of a data modeling technique. In the end, it is used to create a relational database. An ERD is not well suited for unstructured data.

There are different style to draw the ERD:
  • Information engineering
  • Peter Chen
  • Charles Bachman
  • James Martin
Of course there are tons of tools that can help you draw an ERD. Here are some commercial ones I have heard about:
  • ERwin
  • MagicDraw
  • Oracle Designer
  • Rational Rose
  • System Architect
  • TOAD Data Modeler
 I mentioned before that ERDs have entities and their cardinality. Entities are the objects that have data. Cardinality is the relationship between these objects in terms of numbers. The main types of such relationships are:
  1. One-to-one
  2. One-to-many
  3. Many-to-many

CA ERwin

I have been seeing some jobs that want you to have experience with ERwin. I don't think I have used that tool before. Heck. I first started drawing ER diagrams with a generic graphics program like Visio. The last time I used any formal programs for design, it must have been Rational Rose.

CA ERwin is a data modeler. The name stands for "Entity Relationship" and Microsoft "Windows". The tool was created by Logic Works. It allows you to also do requirements analysis. Of course it lets you do database design. ERwin will let you create a custom information system, a transactional database system, or a datamart. There are four main editions of the software:
  1. Community - free, subset of features, max 25 objects
  2. Navigator - read only
  3. Standard - an enterprise edition
  4. Workgroups - collaborative
 ERwin lets you do both logical and physical modeling. Once you have the physical model, ERwin can automatically generate the DDL to create your schema. The physical model supports the following targets:
  • DB2
  • Informix
  • MySQL
  • Oracle
  • SAS
  • Azure
  • SQL Server
  • Sybase
The one targeting Microsoft Azure is its own edition. I tried downloading the community edition. Be warned. The thing takes a long time to install. It first wants to install the Microsoft dot net framework. Then it uses an Installshield wizard to guide you through the options. The default options require 814M of disk space.

Once installed, I found the IDE to be very intuitive. It felt like using Microsoft's Visual Studio, where I feel at home. I started a logical modeling project. Generated a bunch of entities chock full of attributes. Save my file off with an "*.etwin" extension. Later I plan to create a physical model and deploy it to an Oracle database. Oh my.

Oracle Warehouse Builder

I remember my first encounter with Oracle Warehouse Builder (OWB). We were moving to a new database design. We were also moving to a web based system from a client/server one. The scope was huge. The timeline was short. This is the story of my life.

But back to OWB. It is a data warehouse integration solution. The goal is to migrate data from legacy systems and put it into a data warehouse. Sounds like it was just what we needed. OWB has some secondary goal: modeling, profiling, cleansing, and auditing of data.

It was first released in January 2000. OWB 11gR2 is the last released. It will be patched to work with Oracle 12c. But it scheduled to be integrated in Oracle Data Integrator (ODI).

OWB is a repository driven tool. That is, the metadata that describes the ETL is put into a schema. You develop mappings and design process flows with it. It deploys into PL/SQL packages. It runs under Control Center and the OWB runtime. Oracle has to be the target.

The system we were replacing using OWB and a web version of our application went live briefly. It had a bunch of problems. It was slow. It did not implement all of the requirements of the old system. I can't blame these problems on OWB per se. Others have tried to rewrite the huge legacy system without success, and they did not use OWB in their solutions.

E-Business Suite R12

I am looking for my next gig. A human resources person in my company asked what I did for a living. I said I was an Oracle developer. She told me they had some hard to fill positions for me. Unfortunately they required EBS experience. Turns out I have none.

EBS stands for E-Business Suite. It is also know as e-BS, EB-Suite, and Applications/Apps. They are a bunch of Oracle business applications. They are each licensed separately. The latest release is Release 12, also known as R12. EBS is huge. It is second only to SAP in sales.

Here are the main groups of apps in EBS:
  1. Enterprise Resource Planning (ERP)
  2. Customer Relationship Management (CRM)
  3. Supply Chain Management (SCM)
 The EBS is built on top of the following stack of technologies:
  • Oracle Forms Server
  • Oracle Reports Server
  • Apache Web Server
  • Oracle Discoverer
  • JInitiator
  • Java

OLAP

Heard the term OLAP a lot? It stands for online analytical processing. You would use it to analyze multidimensional data. OLAP is part of business intelligence. For example, it can be used for business reporting of sales. It can also be used for data mining. The data is organized in cubes, not tables. This data is queried and becomes parts of reports. There are not as many transactions involved. The database does not have to be huge like a data warehouse.

There are three main operations in OLAP:
  1. Consolidation
  2. Drill down
  3. Slicing and dicing
Consolidation, also know as roll up, is aggregation of data. Drill down is the opposite. It is going into the details. Slicing and dicing let's you view data from different viewpoints. Those viewpoints are called dimensions.

There is a certain term in OLAP called the OLAP Cube. It is also known as a multidimensional cube or hypercube. It contains numeric facts (called measures) categorized by dimensions. This cube is created from data warehouse schemas. The speed in accessing the cube is from the aggregation of data. For some complex queries, accessing data can be 1000 times faster than querying a normal database.

There are different types of popular OLAP systems:
  • MOLAP
  • ROLAP
  • HOLAP
The MOLAP is the multidimensional one. This is the classic type of OLAP. Data is in special array storage hardware. It is fast on queries, and requires less disk space. Loading data into it can be time consuming though. ROLAP is a relational database OLAP. Slicing and dicing translate into WHERE clauses of SQL statements in ROLAP. You can ask this system any question since it has all the non-aggregated data. HOLAP is a hybrid between MOLAP and HOLAP.

More Data Warehouse Schemas

Previously I had written about the different types of schemas for data warehousing. Now I want to write about some more details. The schemas are made up of fact tables and dimension tables. Fact tables contain a bunch of numeric values called measures. They have foreign keys into the dimension tables. In fact, their primary keys are usually a composite of all the foreign keys.

The dimension table categorizes data. It has less rows than a fact table. But it will probably have a lot of attributes. These attributes are descriptive text values that help answer business questions. They can contain data collected at the lowest level. The data can be aggregated into hierarchies. These tables might take along time to load.

Here are some misc terms. A level is a position within a dimension table hierarchy. A stars query is a join between a fact table and a dimension table. A centipede schema is a star schema with a lot of dimensions.

An entity is a chunk of information. That are things of importance. Entities usually map to a database table. Attributes are components of an entity. They define the unique of the entity. Logical design is the process of identifying entities and attributes. You could use a tool such as Oracle Warehouse Builder or Oracle Designer to conduct logical design.

Data Warehouse Schemas

Lots of jobs require some data warehouse experience. So it is time to study up on data warehouse schemas. Here are the common ones:
  • Star
  • Snowflake
  • Fact Constellation
  • Galaxy
The star schema is the simplest. It is also the most common. It is supported by business intelligence tools. OLAP systems build cubes with them. It represents multidimensional data. In the center of the schema is a fact table, usually in the third normal form. The points of the stars are denormalized dimension tables.  It is good for simple and fast queries.

The snowflake schema is like a star schema. However the dimensional tables are normalized. The fact constellation schema has multiple fact tables. It has shared dimension tables which are large. It is more complicated. The galaxy schema has many fact tables. They have common dimensions. It is the combination of many data marts.