Virtual Private Database

Oracle has a security feature introduced in Oracle 8i called the Virtual Private Database (VPD). It allows you to specify policies at the row and column level. Under the hood, it adds a WHERE clause to SQL statements to limit what data you can see. Normally there is no way to bypass this security.

The VPD restrictions apply to select/insert/update/index/delete statements. They do no affect DDL. The security will have an impact on security. There are some types of VPD you can use to lessen the performance hit:
  • Static policy - applied once per query
  • Context sensitive - applied when context changes
  • Dynamic policy - reevaluated every run
VPD runs with definers rights. You write a function that generates the WHERE clause to restrict access. This function must take a schema and object names as inputs. It must return a VARCHAR2 which is the WHERE clause addition. This special function cannot access the table it is providing security for.

You manage the VPD security with calls to the DBMS_RLS package. RLS stands for row level security. There is also a column masking type of VPD. That style will return all rows int he object. But sensitive columns will be returned as NULL values. The column masking technique only applies to SELECT statements.

VPD policies can be managed in groups. One should not perform a SELECT FOR UPDATE on an object managed with VPD. There can be multiple polices on an object. There WHERE clauses implementing the VPD will be joined via AND.

Earlier I said that VPD cannot be bypassed. VPD is not applied in these circumstances:
  • DIRECT path exports
  • Operations on SYS owned objects
  • Operations run by SYS
  • Operations run AS SYSDBA
  • The user has the EXEMPT ACCESS POLICY privilege
VPD goes by other names. It is also known as Fine Grained Access Control. And it is also sometimes referred to as Row Level Security. Thus the DMMS_RLS package.

Bulk Processing

I usually keep performance in mind when I write my PL/SQL. Just being smart about the SQL usually is good enough. But sometimes you got a lot of data to process. In those scenarios, you might need to bulk up. Do some bulk processing that is.

Bulk processing is the term to describe BULK COLLECT and FORALL. The big idea is that there is a PL/SQL engine and a SQL engine in the database. Switching between the two costs you speed due to overhead. So it would be best to make one handoff from the PL/SQL engine to the SQL one.

BULK COLLECT will cause a query to make a single switch into the SQL engine and fill up your colection. FORALL works the other way around, allowing you to do one DML operation for all the items in your collection with just one context switch into the SQL engine.

There is a warning though. Don't do unbounded BULK COLLECT operations. That will eat up a lot of memory in the PGA (Program Global Area). Collections are stored in memory there. You don't want a huge one taking up too much memory. Use the LIMIT statement to cap how many records are grabbed at once in a BULK COLLECT. Experience says you should use a LIMIT of at least 25. It would be preferable to use a LIMIT of around 100. Oracle uses 100 as the default optimization for cursor loops in Oracle 10g and above.

After you do the BULK COLLECT, if there are records, they will start with index 1 in your collection. The rest of the records will be densely populated in the collection. If there are no records, the collection will be empty. The COUNT on your cursor will be zero. There will not be a NO_DATA_FOUND exception raised in this scenario. Similarly you cannot count on %NOTFOUND. Use the COUNT.

Here is a tip for the FORALL. It is not a loop. You don't need a loop command. By default the collection you are doing DML on must be dense. If not, you must use the IN INDICES OF with the FORALL. That only works with Oracle 10g and above. Normally you would not want exceptions in the middle of your FORALL to stop the flow. To prevent that, use the SAVE EXCEPTIONS clause. Then errors will be stored up in the SQL%BULK_EXCEPTIONS pseudo collection.

Data Modeler

Oracle has a free Data Modeler tool. There is a stand alone version of this tool. Another version is bundled with SQL Developer v3 and beyond. Both versions have the same functionality. The stand alone version gives you more screen real estate and easier access to the menus. The tool itself was developed in Java. As such, it requires the JDK.

Data Modeler lets you create, browse, and edit models. Here are some model types it supports:
  • Logical
  • Relational
  • Physcial
  • Multi-dimensional
  • Data Type
You can do forward and reverse engineering with the tool. The targets it supports are Oracle, SQL Server, and DB2. Although I would think you would want to have the tool produce DDL, it cause also spit out other formats like CSV. You can perform collaborative development with this tool.

Why go through the trouble of using a Data Modeler? Some say it is much easier than maintaining scripts manually. It can serve as a documentation tool. Or it can just help you get a handle on an existing database system.

Entity Relationship Diagram

I drew my first Entity Relationship Diagram (ERD) over 20 years ago. Used some drawing package to make the pictures. Not a whole lot has changed since then. The ERD was invented by Peter Chen back in 1976. It looks a lot like a flowchart. The ERD repsents an information system. It shows the logical structure of a database.

The ERD deals with entities and relationships. It does not show a specific entity. Instead it deals with entity sets. Drawing up an ERD is part of a data modeling technique. In the end, it is used to create a relational database. An ERD is not well suited for unstructured data.

There are different style to draw the ERD:
  • Information engineering
  • Peter Chen
  • Charles Bachman
  • James Martin
Of course there are tons of tools that can help you draw an ERD. Here are some commercial ones I have heard about:
  • ERwin
  • MagicDraw
  • Oracle Designer
  • Rational Rose
  • System Architect
  • TOAD Data Modeler
 I mentioned before that ERDs have entities and their cardinality. Entities are the objects that have data. Cardinality is the relationship between these objects in terms of numbers. The main types of such relationships are:
  1. One-to-one
  2. One-to-many
  3. Many-to-many

CA ERwin

I have been seeing some jobs that want you to have experience with ERwin. I don't think I have used that tool before. Heck. I first started drawing ER diagrams with a generic graphics program like Visio. The last time I used any formal programs for design, it must have been Rational Rose.

CA ERwin is a data modeler. The name stands for "Entity Relationship" and Microsoft "Windows". The tool was created by Logic Works. It allows you to also do requirements analysis. Of course it lets you do database design. ERwin will let you create a custom information system, a transactional database system, or a datamart. There are four main editions of the software:
  1. Community - free, subset of features, max 25 objects
  2. Navigator - read only
  3. Standard - an enterprise edition
  4. Workgroups - collaborative
 ERwin lets you do both logical and physical modeling. Once you have the physical model, ERwin can automatically generate the DDL to create your schema. The physical model supports the following targets:
  • DB2
  • Informix
  • MySQL
  • Oracle
  • SAS
  • Azure
  • SQL Server
  • Sybase
The one targeting Microsoft Azure is its own edition. I tried downloading the community edition. Be warned. The thing takes a long time to install. It first wants to install the Microsoft dot net framework. Then it uses an Installshield wizard to guide you through the options. The default options require 814M of disk space.

Once installed, I found the IDE to be very intuitive. It felt like using Microsoft's Visual Studio, where I feel at home. I started a logical modeling project. Generated a bunch of entities chock full of attributes. Save my file off with an "*.etwin" extension. Later I plan to create a physical model and deploy it to an Oracle database. Oh my.

Oracle Warehouse Builder

I remember my first encounter with Oracle Warehouse Builder (OWB). We were moving to a new database design. We were also moving to a web based system from a client/server one. The scope was huge. The timeline was short. This is the story of my life.

But back to OWB. It is a data warehouse integration solution. The goal is to migrate data from legacy systems and put it into a data warehouse. Sounds like it was just what we needed. OWB has some secondary goal: modeling, profiling, cleansing, and auditing of data.

It was first released in January 2000. OWB 11gR2 is the last released. It will be patched to work with Oracle 12c. But it scheduled to be integrated in Oracle Data Integrator (ODI).

OWB is a repository driven tool. That is, the metadata that describes the ETL is put into a schema. You develop mappings and design process flows with it. It deploys into PL/SQL packages. It runs under Control Center and the OWB runtime. Oracle has to be the target.

The system we were replacing using OWB and a web version of our application went live briefly. It had a bunch of problems. It was slow. It did not implement all of the requirements of the old system. I can't blame these problems on OWB per se. Others have tried to rewrite the huge legacy system without success, and they did not use OWB in their solutions.

E-Business Suite R12

I am looking for my next gig. A human resources person in my company asked what I did for a living. I said I was an Oracle developer. She told me they had some hard to fill positions for me. Unfortunately they required EBS experience. Turns out I have none.

EBS stands for E-Business Suite. It is also know as e-BS, EB-Suite, and Applications/Apps. They are a bunch of Oracle business applications. They are each licensed separately. The latest release is Release 12, also known as R12. EBS is huge. It is second only to SAP in sales.

Here are the main groups of apps in EBS:
  1. Enterprise Resource Planning (ERP)
  2. Customer Relationship Management (CRM)
  3. Supply Chain Management (SCM)
 The EBS is built on top of the following stack of technologies:
  • Oracle Forms Server
  • Oracle Reports Server
  • Apache Web Server
  • Oracle Discoverer
  • JInitiator
  • Java

OLAP

Heard the term OLAP a lot? It stands for online analytical processing. You would use it to analyze multidimensional data. OLAP is part of business intelligence. For example, it can be used for business reporting of sales. It can also be used for data mining. The data is organized in cubes, not tables. This data is queried and becomes parts of reports. There are not as many transactions involved. The database does not have to be huge like a data warehouse.

There are three main operations in OLAP:
  1. Consolidation
  2. Drill down
  3. Slicing and dicing
Consolidation, also know as roll up, is aggregation of data. Drill down is the opposite. It is going into the details. Slicing and dicing let's you view data from different viewpoints. Those viewpoints are called dimensions.

There is a certain term in OLAP called the OLAP Cube. It is also known as a multidimensional cube or hypercube. It contains numeric facts (called measures) categorized by dimensions. This cube is created from data warehouse schemas. The speed in accessing the cube is from the aggregation of data. For some complex queries, accessing data can be 1000 times faster than querying a normal database.

There are different types of popular OLAP systems:
  • MOLAP
  • ROLAP
  • HOLAP
The MOLAP is the multidimensional one. This is the classic type of OLAP. Data is in special array storage hardware. It is fast on queries, and requires less disk space. Loading data into it can be time consuming though. ROLAP is a relational database OLAP. Slicing and dicing translate into WHERE clauses of SQL statements in ROLAP. You can ask this system any question since it has all the non-aggregated data. HOLAP is a hybrid between MOLAP and HOLAP.

More Data Warehouse Schemas

Previously I had written about the different types of schemas for data warehousing. Now I want to write about some more details. The schemas are made up of fact tables and dimension tables. Fact tables contain a bunch of numeric values called measures. They have foreign keys into the dimension tables. In fact, their primary keys are usually a composite of all the foreign keys.

The dimension table categorizes data. It has less rows than a fact table. But it will probably have a lot of attributes. These attributes are descriptive text values that help answer business questions. They can contain data collected at the lowest level. The data can be aggregated into hierarchies. These tables might take along time to load.

Here are some misc terms. A level is a position within a dimension table hierarchy. A stars query is a join between a fact table and a dimension table. A centipede schema is a star schema with a lot of dimensions.

An entity is a chunk of information. That are things of importance. Entities usually map to a database table. Attributes are components of an entity. They define the unique of the entity. Logical design is the process of identifying entities and attributes. You could use a tool such as Oracle Warehouse Builder or Oracle Designer to conduct logical design.

Data Warehouse Schemas

Lots of jobs require some data warehouse experience. So it is time to study up on data warehouse schemas. Here are the common ones:
  • Star
  • Snowflake
  • Fact Constellation
  • Galaxy
The star schema is the simplest. It is also the most common. It is supported by business intelligence tools. OLAP systems build cubes with them. It represents multidimensional data. In the center of the schema is a fact table, usually in the third normal form. The points of the stars are denormalized dimension tables.  It is good for simple and fast queries.

The snowflake schema is like a star schema. However the dimensional tables are normalized. The fact constellation schema has multiple fact tables. It has shared dimension tables which are large. It is more complicated. The galaxy schema has many fact tables. They have common dimensions. It is the combination of many data marts.

APEX

I had briefly played with Oracle Application Express (APEX) before. But I am now coming back and familiarizing myself with the big picture of the tool. It allows you to build application using a browser based user interface. The IDE has a code editor, supports drag and drop, does not require recompilations, and allows you to use jQuery Mobile. The IDE also supports languages/standards like SQL, PL/SQL, JavaScript, REST, and SOAP.

You don't have to manually write your CSS and JavaScript. APEX produces a responsive user interface. You can deploy your APEX apps to the Oracle Cloud. APEX runs with different versions of the Oracle database from the free XE to the Enterprise Edition.

APEX comes with a number of sample applications such as bug tracking, survey builder, and a customer tracker. APEX incurs no additional cost. It runs wherever the Oracle database can run. It was first released in 2004. The latest version is 5.0.1 released on July 15, 2015. The product has gone through a number of names:
  1. Flows
  2. Oracle Platform
  3. Project Marvel
  4. HTML DB
  5. Application Express (APEX)
APEX is used to build web apps. It was created by Mike Hichwa. He is the same guy who developed Web DB. It is built on top of mod_plsql. The tool is used by Oracle itself in a number of sites such as AskTom Knowledgebase and the Oracle online store.

APEX is easy to deploy. You can host your demo apps on Oracle's host for free. It supports themes. The install is heavy coming in at 147M for the English only version. There are some limitations on the tool. Primary keys can contain at most two fields. There is a max of 200 items per page. Forms can handle a max of 200 database items.

APEX is not supported by many webhosts. There is no built in version control. The base server side language is PL/SQL. In summary, it is a rapid web application development tool. It requires at least Oracle 9iR2. Since Oracle 11g, it has been included with the database install.

Oracle XML DB

I know XML. But I don't know XDB, which is Oracle's XML DB. It is actually a set of technologies paired with an Oracle database. The two main pieces are (1) XMLType tables/views, and (2) the XML DB Repository. The repository holds resources such as folders and files.

XML DB is a native XML storage technology. The tech supports XML standards such as XML schemas. You register the schemas, and it creates some tables to hold the info. You can access the XML data through HTTP, SQL, JDBC, WebDAV, or FTP.

Messaging is done with the Oracle Streams AQ. XMLType can be used as the payload type. You can queue up XMLType messages. There are a number of APIs to use XML DB. These include XMLType (surprise), DBMS_XDB, and a number of other PL/SQL packages that start with DBMS_XDB*.

There are a bunch of new views to check on XML DB info. Here are the DBA views:
  • DBA_XML_SCHEMAS
  • DBA_XML_TABLES
  • DBA_XML_TAB_COLS
  • DBA_XML_VIEWS
  • DBA_XML_VIEW_COLS
You get the USER_* and ALL_* corresponding views as well. XML text is stored in CLOBs. There are two ways to handle storage: (1) unstructured storage, and (2) structured storage. Unstructured is fast. Structured uses up less space but increases overhead during processing.

XML DB supports the SQL/XML standard. It requires at least Oracle 9R2. Altova, the people who make XMLSpy, have an editor for XML DB. There is a lot of info out there on how to install XML DB. If you are upgrading an existing DB, you should install XML DB manually. Otherwise you can use the Database Configuration Assistant (DBCA).

XML DB can be managed with Oracle Enterprise Manager. The internal configuration is done via the "xdbconfig.xml" file.

Data Pump

I might need to take the contents of my development database with me. How should I do that? Glad you asked. Since Oracle 10g, the answer is most likely Data Pump. This features consists of three parts:
  1. The clients expdp and impdp
  2. The Data Pump API - DBMS_DATAPUMP
  3. The MetaData API - DBMS_METADATA
The clients are similar to the original exp and imp utilities. They are not compatible with them. They just supersede them. The clients themselves use the Data Pump API internally. They programs are located in directory $ORACLE_HOME/bin. They can take a parameter file. You don't need to use these clients. You can use the API directly via the PL/SQL package.

DBA must create the directory objects used by Data Pump. You will need the following database privileges to take full advantage of the utility:
  • EXP_FULL_DATABASE
  • IMP_FULL_DATABASE  
Data Pump will use either direct path or external tables to handle the data. It decides which on what data you are trying to move. Sounds a bit like  sqlldr to me. These utilities do not support XML schemas. The export and import occur on the server, unlike the predecessors exp and imp.

One master process is created per job. That process controls the worker processes. There is also a master table which tracks progress. It can be used to restart a job that did not complete. This table gets dropped after the run is successful. You can check on the progress with the V$SESSION_LONGOPS view.

Here are the types of files managed by Data Pump:
  • Dump files - the data
  • Log files - messages
  • SQL files - DDL to reconstruct
impdb is the Data Pump Inport. expdp is the Data Pump Export. They both use a dump file set, which are binary files containing the data. Do not run these as SYSDBA. Here are the modes in which these programs can operate:
  • Full
  • Schema
  • Table
  • Tablespace
  • Transportable tablespace

Advanced Queuing (AQ)

Do you know what? I have never used Oracle's Advanced Queuing (AQ). That's no excuse for not knowing what it is. I did a little research this week. AQ is Oracle's version of message oriented middleware. Yeah. I know. Sounds like a good buzzword. In English, this is an asynchronous publish/subscribe model. One use is for web apps to communicate.

AQ was first released in Oracle 8. It became free of charge in Oracle 9.2. Later, it was folded into Oracle Streams in Oracle 10.1. The functionality is now known as Oracle Streams AC. That is because it is built on top of Oracle Streams. Actually, I think the low level block it is built on are Oracle database tables.

Here is the order of doing things in AQ. You create a type. Then you create a queue table. Next you create the actual queue. Once all that is done, you can enqueue and dequeue all you want. You should be aware of one piece of terminology. The messages are called the payload.

You can asynchronously queue and dequeue. Or you could use notifications to automatically dequeue. To do that you need to create a callback procedure, add a subscriber, and register the subscriber. The roles specific to AQ are AQ_ADMINISTRATOR_ROLE and AD_USER_ROLE.

DBMS_SCHEDULER

I am old fashioned. I use UNIX cron to kick off jobs. They might sqlplus into the database to run some SQL scripts. However Oracle has ways within the database to run scheduled jobs. I am just not that familiar with them. I think the old way was using DBMS_JOBS. Now we have a replacement (or at least an extension) in DBMS_SCHEDULER.

The package gives you procedures such to create a job, run a job, drop a job, enable/disable jobs, or create a schedule. You can provide a job to these procs. Or you can use a list of jobs. Be warned that if you chain jobs, any error halts the whole chain.

One way to do scheduling is to specify a frequency like DAILY or WEEKLY. You don't need to commit changes to have the jobs actually start. So they operate kinds of like DDL in that respect. Some good views of interest to see job info are DBA_SCHEDULER_JOBS and DBA_SCHEDULER_JOB_LOG.

The main privilege you need to run a job is CREATE_JOB. The super priv is SCHEDULER_ADMIN. But you should use that one sparingly. The package was first released in Oracle 10g. Other notable enhancements were the ability to run jobs on external servers in 11gR1. And we got email notifications in 11gR2.

I still am a newbie in the scheduling of jobs within the database. But I am starting to learn the landscape.

SQL*Loader Refresher

I have started searching for an Oracle development job. Optimally it will be one where I can do a lot of PL/SQL coding. Some jobs out there require experience with SQL*Loader. I have used the tool in the past. But not much. Time to refresh my memory.

SQL*Loader will take a data file as input, and put the data in your database tables. The main driver of reading in that file is the control file. The control file can specify a fixed datafile format. That has the best performance. Or it can specify a variable length. The length of each line can vary, with the size being set in the first couple character of the line.

You run SQL*Loader with the sqlldr command. You pass it a username and password. Plus you give it the name of the control file. It will generate a log file with the same name as the control file, but with a log extension, by default.

SQL*Loader can also generate a bad file. This will contain records rejected by SQL*Loader. It can also generate a discard file. That file has input records from the datafile that were not selected by the constraints specified in the control file. These are optional files to be generated.

There are a few methods that SQL*Loader can use to load data. The first is conventional path loading, where data is copied to a bind array, then inserted into the database table. The second is direct path loading, where blocks are built and written directly to the database. It is fast. The third is external table loading.

If you want to load an Excel file, which I sometimes do, you need to export it to CSV format first. There is a way to put the actual data being loaded right in the control file. This eliminates the need for a separate datafile. You just give them command "INFILE *" in the control file. Then you prepend your data section with the keyword BEGINDATA.

Selective loading of records can be accomplished with the WHEN clause. Just note that you can only use AND in the parts of that clause. OR is not allowed. You can specify the batch size using the ROWS parameter. You tell what table the data is going into using the INTO TABLE clause. That is followed by fields. and optionally their types.

Although I have never done it before, SQL*Loader can read data into collections such as nested arrays or VARRAYs.

LISTAGG

I read this month's edition of Oracle Magazine, skimming over most articles. But I always stop and read Steven Feuerstein's column on PL/SQL intently. His listing 1 this time around made use of something call LISTAGG. It had a weird syntax. I honestly had never heard of it.

Turns out LISTAGG is new in Oracle 11g release 2. It does something called "string aggregation." That is a fancy way of saying concatenating values from multiple rows into one big string. Hmm. Sounds like it could be useful in certain scenarios.

So you give the LISTAGG a MEASURE_COLUMN. That would be the column whose values are getting concatenated into a string.  You also provide an ORDER BY. This will be the order in which the values are put into the concatenated string. You can specify the delimiter which separates values in the concatenated string.

I got to try better to keep up with new stuff from later Oracle releases.