Lost Connection

Our testing team reported that all application would error out after 30 minutes of non-use. They got an error message stating the connection had been lost. Initially I picked up this problem and researched it. Most people immediately discounted this as a database server issue. However I had some evidence that this might not be so. This year we moved from the Oracle 8 client to the Oracle 10g client. Thread based connections seemed to behave differently in 10g. So I found some instances where you needed to do some explicit connection management between threads using database contexts. If not, I recall getting a lot of connection lost error messages.

I was able to duplicate the problem easy enough. Just log in and wait 30 minutes without doing anything. Sure enough I made the problem occur. However after reviewing the code involved with just logging in, I determined this was not a thread coding issue. I hit Google search to see if I could find any knowledge on the subject. Most of the web sites I visited recommended that you modify the timeout parameter on the server’s sqlnet.ora file. Armed with this information, I forwarded the problem to the DBA Team.

The next time I saw the lead DBA, I asked him about our timeout problem. He said that he encountered the problem as well. It was happening even in SQL*Plus. I told him I read a lot of articles about the timeout value in sqlnet.ora. However the DBA insisted that the timeout was not set on our server. Instead he believed this to be a network issue. So he forwarded the problem to the network administration team. This did not immediately resolve the problem. But it was due to the problem not receiving the priority it required. A couple calls later from the DBA, and it seems the problems have been resolved.

A lot of my experience with this problem is that you cannot just trust your instinct with any given problem. Nor can you entirely defer to the knowledge gained by Oracle. You need to dig in to the specifics of the problem and methodically identify the true cause. This is not to say that you need to discount any hunch you have. However you need to identify the hunches as theories, and test out those theories. Agent Moulder said it best when he recommend you trust nobody.