Setting up Wallet Using Orapki

I have a stored procedure that is using utl_http to retrieve the contents of a web page. Basically I am doing web scraping for information. I have previously set up my ACLs so that Oracle will not block my web access. However I encounter a ORA-29024: Certificate validation failure. What do I got to do to get web access?

Turns out I was going to a site that uses Secure HTTP (https). And you just cannot access such a site without some setup. Okay. I know Internet Explorer can get to the site. So I click in my browser to access the root certificate the browser uses to access the https site. That root server is VeriSign.

I export the certificate to a file. Then I use my new friend orapki. This is a command line program used to set up my wallet. I first create the wallet. Give it a password plus a location on my file system where the wallet will reside. Then I add the VeriSign certificate to the wallet. I have got to be good to go now.

Well there is good new and bad. The good news is that Oracle let's me go out to the Internet and download web content. The bad news is that the web site thought I was a bot (I am). So I could not get the actual content I desired. Now I might need to figure out how to trick this web site into believing I am just another web browser.

Access Control Lists

I have a stored procedure that is trying to use utl_http go scrape a web page. Initially I was encountering an ORA-24247: network access error. Apparently the user running the procedure must be explicitly given access to the network before Oracle will let this web request go through.

So I create an ACL by calling the CREATE_ACL procedure. I actually need to name the XML file where the details will be stored. I provide the user I will use as a parameter to the procedure. Now that the ACL has been created.

I also need to call the ASSIGN_ACL procedure. This is where I specify the host and ports. I am not choosing any specifics here. Just want to punch through the network using Oracle. Luckily I can use wildcards and NULL to be generic.

After crossing my fingers, I run my custom procedure to access a web page. Now I get an ORA-29024: Certificate validation failure. Does it really have to be this hard? The browsers make it look really easy to retrieve content on the web.

Package utl_http to the Rescue

I have a table full of data that I extracted out of an XML file. That was a good start. However the data is minimal. I need more info. Where can I get it? From the Internet. Of course. I have written scrapers before. But they were client programs. Now I want to do this from inside my Oracle 12c database. Can you say utl_http? That is the package to use.

Now I am going to be doing a bunch of scraping. Don't want the web site to detect that this is being done from code. So I have to use web proxies in the middle. Not a problem. I just call the set_proxy function. Someone has thought about this need before.

This is the typical web request/response business. So I have to formulate a request. Then I just read the response, line by line. It is a bit strange that when I read past the end of the HTML body, an exception is thrown. But I can catch that fine. This almost seems too easy.

When I run my code, I get an ORA-24247: network access error. What? Note that this is a fresh install of Oracle 12c enterprise edition. Apparently it comes out of the box with Internet access disabled. Darn. How can I punch through this limitation? I need to set up some ACLs. Give my user permission to go out to the web site through the ports I am using. Okay.

Have not figured the details out yet. I read a bit and got the feeling that it is a two step process. More on that when I get it working.

Using XML DB

All right. I got my Oracle 12c enterprise database installed. Now I got a big XML file I want to parse. Extract out the data. Stick everything in a database table or two. But how the heck do you do that? I am used to writing client/server programs that could do this. However this time around, I want to do it in the database.

Well I can declare an XMLType variable. Then for a quick and dirty, put the XML text in the constructor. Use the Extract function to dig down into a certain XPath in the data. Then use the ExtracValue function to get fields. Iterate over the whole XML document. Insert data into a real table. Done. Right?

This hack worked for a small data set of three records. Happy with that success, I jammed 5k records into the XMLType variable. Ran the code. It bombed. Not even sure what the error was. Because the whole sqlplus program blew up. Obviously I was doing something wrong here.

Okay. Let's declare a CLOB variable instead. Read the XML file into the CLOB. Then parse this XML. This feels better. It feels right. I can call selectNodes. That gives me a DOMNodeList. Iterating through that list gives me the data I want. No trouble throwing 5k records at that. I now have a table full of data.

Next up, I need to work that data.