Proving database connectivity

April 25, 2007 at 3:23 pm | Posted in Architecture, Documentum and Oracle, Troubleshooting | 2 Comments

I was faced with the problem of a Content Server not starting following a server reboot. I brought up Documentum Server Manager which had both the Start and Stop buttons greyed out. This usually means the service has been asked to start up but is still in the startup process. After 10 minutes with no CPU or IO activity I concluded that the connection to the database had hung. Killing the documentum.exe process and starting the content server service again had no effect so I started a command prompt and typed:

c:\>sqlplus /nolog
SQL>connect system/******

Connected

No problem with connecting to the database.

After scratching my head for a few minutes I thought I’d check the Oracle TNS Listener:

c:\>lsrnctl
LSNRCTL>status
Connecting to (ADDRESS=(PROTOCOL=tcp)(PORT=1521))

The command hung and usually there’s a load of status information after the connecting line so the problem seems to lie with the listener. After restarting the Oracle TNS Listener service the docbase started up sucessfully.

But hold on! How come SQL Plus could connect? Doesn’t that require the listener to reach the database? In some cases it doesn’t. Using the simple connect system/****** connects you to the default database on the local machine and doesn’t require the listener. However if I had used the connect user/password@connect_string command format:

SQL>connect system/******@dctm3

I would have got the same hanging behaviour I got from the content server. In this case dctm3 is a net service name defined in the tnsnames.ora file.

It’s always useful to have simple troubleshooting tools to test out if all components in the system are up and running. Using SQLPlus to test connectivity to the database is a good idea but the test needs to be as close as possible to the calls the Content Server would have been making. In future my SQLPlus troubleshooting test will be:

  1. Open server.ini for the content server and retrieve the database connection string contained in the database_conn key
  2. Test the connection from SQL Plus using the connection string connect user/password@connect_string

Assessing Query Performance

April 18, 2007 at 8:18 pm | Posted in Documentum and Oracle, Performance, Troubleshooting, Xense Profiler | Leave a comment

In yesterday’s post I talked about the potential performance problems with a particular query. I noticed this problem whilst reviewing a trace file for a batch process I was testing. Since this was a log4j trace the trace lines were output with detailed timings on each line. The trace file consistently had 2 lines that were separated by 8 or more seconds. I wondered what was happening in this period since if this was all time running database queries that’s an awful lot of processing going on.

I captured a dmcl trace whilst running the process and put the output through the Xense Profiler (a DMCL trace profiling tool). The top 10 queries report showed a single query consuming most of the time (our friend the ORed Folder query from yesterday), so I derived the SQL and set to work in SQL Plus to look at the performance of the query.

SQL Plus has a very useful feature called ‘autotrace’. When you execute a query in SQL Plus with autotrace on SQL Plus will output a query execution plan and some very useful execution statistics. The execution statistics from running the problem query were:

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     655992  consistent gets
          0  physical reads
          0  redo size
       3528  bytes sent via SQL*Net to client
        558  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        132  rows processed

The key statistic here is the 655,992 consistent gets. Each consistent get is a fetch of a database buffer from the database buffer cache (AKA logical IO). This is a large amount of work to retrieve just 132 rows. This explained why the query was taking so long (as a very rough rule of thumb Oracle running on a reasonably fast processor might be able to perform between 50,000 and 100,000 logical IOs). Notice that there were no ‘costly’ physical reads, the time was all spent thrashing through cached data buffers.

Still this is only 8 seconds and this is only a batch process, why worry?

Three reasons:

  • There were no physical reads in the execution statistics because the only query I had been running on the test system was this one. In a real life system there are likely to be many other queries running all utilising the buffer cache. There is a good chance that running this query in a real life production system will mean many of the logical IOs will turn into physical IOs. Even if only 5% become physical reads (i.e. we have a ‘good’ buffer cache hit ratio for this query) that’s 30,000 physical IOs – our query suddenly starts taking a lot longer to execute.
  • The system I was testing on had a relatively small number of objects (about 150,000). Since a production system is likely to be much bigger and probably growing, the number of logical IOs required to complete the query will grow in proportion. This is a consequence of the execution plan for the query; with dm_sysobject_r or dm_sysobject_s as the first table in the join we are effectively scanning through a table that grows in proportion to the size of the docbase.
  • It is always a good idea to seek to reduce the workload placed on the database. This improves scalability and general responsiveness of the system

So what happened after restructuring the query? SQL Plus with autotrace gave:

Statistics
----------------------------------------------------------
         14  recursive calls
          0  db block gets
       1032  consistent gets
          0  physical reads
          0  redo size
       3528  bytes sent via SQL*Net to client
        558  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        132  rows processed

Only 1032 logical IOs and since the query can drive through an index on dm_folder_r the performance is largely unaffected by increases in the size of the docbase.

Performance Tip: Avoid ORed Folder predicates

April 17, 2007 at 8:48 pm | Posted in Documentum and Oracle, Performance | 1 Comment

Consider the following query that contains a ‘Folder’ predicate:

select 	r_object_id 
from 	dm_document 
where 	folder('/Temp/Folder1',descend) 

The query selects all documents that are in /Temp/Folder1 or any sub-folders underneath. This sort of query generally performs quite well. However if you want to select documents from 2 or more sets of folder it is best to avoid ORing the folder predicates. E.g. this query may scale poorly as the number of dm_document objects in the system increases:

select 	r_object_id 
from 	dm_document 
where 	folder('/Temp/Folder1',descend) 
OR 	folder('/Temp/Folder2',descend)

Why is this? The SQL generated by the first query looks like this:

select 	all 
	dm_document.r_object_id 
from 	dm_document_sp dm_document, 
	dm_folder_r dm_folder_r1, 
	dm_sysobject_r dm_sysobject_r2 
where 	(    dm_document.r_object_id = dm_sysobject_r2.r_object_id 
	 and dm_sysobject_r2.i_folder_id = dm_folder_r1.r_object_id 
	 and dm_folder_r1.i_ancestor_id = '0b0000038000e990'
	) 
and 	(    dm_document.i_has_folder = 1 
	 and dm_document.i_is_deleted = 0
	)

The folder predicate is implemented by adding the dm_folder_r table into the join and filtering on the i_ancestor_id field. The database optimiser can evaluate different join orders and work out the best table join order. If there are a relatively small number of objects under the folder /Temp/Folder1 then probably the best table to join first is dm_folder_r; subsequent joins to the dm_sysobject_r and dm_sysobject_r table will involve only a small number of rows.

On the other hand the SQL for the ORed predicate DQL looks like this:

select 	all 	
	dm_document.r_object_id 
from 	dm_document_sp dm_document 
where 	(    dm_document.r_object_id in (	
		select 	dm_sysobject_r2.r_object_id 
		from 	dm_sysobject_r dm_sysobject_r2, 
			dm_folder_r dm_folder_r1 
		where 	dm_sysobject_r2.i_folder_id 
				= dm_folder_r1.r_object_id 
		and 	dm_folder_r1.i_ancestor_id 
				= '0b0000038000e990')  
	 or  dm_document.r_object_id in (	
		select 	dm_sysobject_r2.r_object_id 
		from 	dm_sysobject_r dm_sysobject_r2, 
			dm_folder_r dm_folder_r1 
			where 	dm_sysobject_r2.i_folder_id 
					= dm_folder_r1.r_object_id 
			and 	dm_folder_r1.i_ancestor_id 
					= '0b0000038000e995') 
	) 
and 	(    dm_document.i_has_folder = 1 
	 and dm_document.i_is_deleted = 0
	)

The SQL is completely different. In this case the folder predicates are implemented using a correlated sub-query for each folder predicate. The database is unable to find a query plan that allows the optimal join order – the one where dm_folder_r is joined first. The only way the database can execute the query is to select every current dm_document object and attempt to join it to the dm_folder_r. In a large database the work to select every dm_document object could be huge.

So what are the alternatives? I can think of a few but a UNION ALL would be top of my list in this particular case:

select 	r_object_id 
from 	dm_document 
where 	folder('/Temp/Folder1',descend) 
UNION ALL
select 	r_object_id 
from 	dm_document 
where 	folder('/Temp/Folder2',descend)

By the way this performance tip applies whether the folder contains the ‘descend’ clause or not.

Blog at WordPress.com.
Entries and comments feeds.