Query Formatting

September 19, 2008 at 2:12 pm | Posted in Performance, Troubleshooting | 4 Comments
Tags: , , ,

What does this query do?

SELECT ALL 'y' AS notnavigatable,o.r_object_type AS r_object_type,
o.r_lock_owner AS r_lock_owner,o.r_object_id AS r_object_id,'y' AS 
selectable,'0' AS isfolder,upper(o.object_name) AS objname,
o.owner_name AS owner_name,o.r_version_label AS r_version_label,
o.i_is_reference AS i_is_reference,o.a_content_type AS a_content_type,
o.object_name AS object_name,100 AS idunion,o.r_is_virtual_doc AS 
r_is_virtual_doc,'' AS navigatable,o.r_link_cnt AS r_link_cnt,
o.r_full_content_size AS r_content_size FROM dm_process o 
WHERE ( r_definition_state = 2) AND NOT 
((FOLDER('/Resources', DESCEND) OR FOLDER('/System')) OR 
FOLDER('/System/DistributionList Templates', DESCEND) AND 
o.object_name LIKE 'dmSendTo%') ORDER BY 13 ASC,7 ASC,
4 ASC,9 ASC

I don’t really have an idea either until I format it into something readable:

SELECT ALL 	'y' AS notnavigatable,
		o.r_object_type AS r_object_type,
		o.r_lock_owner AS r_lock_owner,
		o.r_object_id AS r_object_id,
		'y' AS selectable,'0' AS isfolder,
		upper(o.object_name) AS objname,
		o.owner_name AS owner_name,
		o.r_version_label AS r_version_label,
		o.i_is_reference AS i_is_reference,
		o.a_content_type AS a_content_type,
		o.object_name AS object_name,
		100 AS idunion,
		o.r_is_virtual_doc AS r_is_virtual_doc,
		'' AS navigatable,
		o.r_link_cnt AS r_link_cnt,
		o.r_full_content_size AS r_content_size 
FROM 		dm_process o 
WHERE 		( r_definition_state = 2) 
AND 		NOT (
			(   FOLDER('/Resources', DESCEND) 
			 OR FOLDER('/System')
			) 
		     OR FOLDER('/System/DistributionList Templates', DESCEND) 
		     AND o.object_name LIKE 'dmSendTo%'
		) 
ORDER BY 13 ASC,7 ASC,4 ASC,9 ASC

It is now immediately apparent which object types or registered tables are being queried (dm_process in this case), which attributes are in the select list and what the query predicates are. I generally use tabs to space out each block (e.g. the attributes in the select list are all aligned to a tab) and to show nesting of predicates and sub-queries.

Another benefit of formatting the query is that, generally, you are not that interested in the attribute select list when tuning queries. As a rule you look first at the object types/tables and predicates involved, along with any additional clauses that affect how queries will be processed like ORDER BY and GROUP BY. By formatting the query you can immediately spot the important FROM, WHERE, ORDER BY and GROUP BY clauses.

Here’s another example of a properly formatted query with multiple object types to illustrate another point:

select 	p.retainer_root_id as retainer_root_id, 
	t.r_object_id as retained_object_id, 
	p.event_date as event_date, 
	p.r_object_id, 
	p.object_name as retention_name, 
	p.r_policy_id, 
	p.retention_policy_id, 
	p.r_retention_status, 
	p.entry_date, 
	p.qualification_date, 
	p.r_object_type, 
	p.phase_name, 
	q.object_name as policy_name, 
	s.object_name as retention_policy_name 
from 	dm_sysobject (all) t, 
	dmc_rps_retainer p, 
	dm_policy q, 
	dmc_rps_retention_policy s 
where 	q.r_object_id=p.r_policy_id
and 	s.r_object_id=p.retention_policy_id 
and 	p.r_object_id in (	
		select 	i_retainer_id 
		from 	dm_sysobject (all) 
		where 	(r_object_id = '0900379780155e1b')
	) 
and 	t.r_object_id  ='0900379780155e1b'
and	p.retainer_root_id is not null 

Where there are multiple types or tables in the FROM clause I put them, one line for each, one after the other. Then in the predicate (where) clause I ensure that the join conditions come first followed by the other, filtering, clauses.

You can immediately see in this example that dmc_rps_retainer, dm_policy and dmc_rps_retention_policy all have join conditions but dm_sysobject (all) does not. This should raise alarm bells as if there is no join condition for a table then all the rows of the table will be joined to the rowset formed by the other table joins – this could potentially result in massive amounts of work for the database engine.

In this case there is no need to worry as we can see further down that there is a filtering condition t.r_object_id =’0900379780155e1b’ which will ensure that only a single row will be output for joining to the other tables.

It is impossible to effectively tune queries without understanding what they do and it is very difficult to understand what a query does without formatting it. With only a little practice it is easy to format even very large queries in a few minutes.

Troubleshooting agent_exec garbage collection

January 17, 2008 at 3:32 pm | Posted in Troubleshooting | 1 Comment

There seem to be more and more posts on the forums about jobs ‘stuck’ in the Running state and I have been investigating this problem for a client recently so I thought I would summarise some of the troubleshooting techniques I use. This posting expands on the article I wrote a few years ago about agent_exec.

The problem is usually expressed in the form of ‘DA shows my job is running but I know it’s not’. First of all DA shows a job as ‘Running’ whenever it finds a job whose a_special_app attribute is set to ‘agentexec’. Since agent_exec sets this attribute when it starts and clears it when the job has finished, under normal circumstances this is a quite accurate reflection of whether a job is running or not.

However if the agent_exec processes are interrupted before clearing the attribute (if the box is rebooted or the content server hangs for instance) then the job object can be left with a_special_app = ‘agentexec’ and DA shows the job as running.

Of course the agent_exec attempts to deal with such a situation. Every time it wakes up to perform some processing it first runs a ‘garbage_collect_jobs’ routine. You won’t see much evidence of this in the logs unless you turn on agent_exec tracing (see my job scheduler article for details on how to do this). You will get the follow lines when there is nothing to garbage collect:

Thu Jan 17 13:39:41 2008 [AGENTEXEC 283604] garbage_collect_jobs
Thu Jan 17 13:39:41 2008 [AGENTEXEC 283604] do_exec:  execquery,s0,F,
     SELECT ALL   r_object_id, a_last_invocation, ...
Thu Jan 17 13:39:41 2008 [AGENTEXEC 283604] do_get:  getlastcoll,s0
Thu Jan 17 13:39:41 2008 [AGENTEXEC 283604] do_next:  next,s0,q0
Thu Jan 17 13:39:41 2008 [AGENTEXEC 283604] do_exec:  close,s0,q0

Basically agent_exec runs the following query:

SELECT ALL       
              r_object_id, a_last_invocation,   
              a_last_completion, a_special_app 
FROM          dm_job 
WHERE ( (     (a_last_invocation IS NOT NULLDATE) 
             AND (a_last_completion IS NULLDATE))  
             OR  (a_special_app = 'agentexec')) 
AND     (i_is_reference = 0 OR i_is_reference is NULL) 
AND     (i_is_replica = 0 OR i_is_replica is NULL)

If jobs are returned from this query and agent_exec can not match the job with an existing running job it will clean up the job object, unsetting a_special_app and setting a_last_invocation to the current time.

Here is some typical trace output in the agentexec.log file when I set the dm_LogPurge a_special_app attribute to agentexec.

This output show that this is the source of the infamous message
Detected while processing dead job dm_LogPurge: The job object indicated the job was in progress, but the job was not actually running. It is likely that the dm_agent_exec utility was stopped while the job was in progress.

DMCL tracing dm_agent_exec

Examining the agentexec trace is usually enough to figure out where the problems lies however in extreme cases it is useful to look at the dmcl trace for the agentexec process to further troubleshoot issues. In principle you can do this by setting the dmcl.ini trace_file parameter to an existing directory on the Content Server. However this has the disadvantage of turning on tracing for all dmcl processes on the content server i.e. all jobs and methods.

What we really want to do is isolate the agentexec process from all others and in this section I tell you how. I present the steps along with explanations for a typical Windows server. The same principle applies to *nix servers usually with a suitable change of folder paths.

First force the agent exec to stop. You can do this by killing the main agent_exec process repeatedly. The Content Server will detect that the agent exec dies and try and restart it, however there is a limit to the number of times this will happen (seems to be 5 by default). Eventually you get the following message in the content server log and the dm_agent_exec stays dead:

Thu Jan 17 13:35:37 2008 984000 [DM_SESSION_W_AGENT_EXEC_FAILURE_EXCEED]warning: "The failure limit of the agent exec program has exceeded. It will not be restarted again. Please correct the problem and restart the server."

Copy the agent_exec executable to a separate directory. Copy the program file %DM_HOME%\bin\dm_agent_exec.exe to a new directory e.g. c:\Documentum\agentexec.

Copy the dmcl.ini. Copy the main dmcl.ini file in c:\windows to c:\Documentum\agentexec. Now edit the file and add the following lines:


trace_level = 10
trace_file = c:\Documentum\agentexec

We are going to take advantage of the fact that the first place the dmcl looks for the dmcl.ini is in the current working directory.

Start the agent_exec from the command line. Use the following syntax:

dm_agent_exec -docbase_name docbase  -docbase_owner dmadmin -trace_level 1

Agent exec logging and trace output will continue to appear in the %DOCUMENTUM%\dba\log\agentexec\agentexec.log, however a number of dmcl trace files will also be created in C:\Documentum\agentexec directory. One of these (probably the largest) will be the dmcl trace for the main agent_exec process; remember agent_exec works by forking off a new dm_agent_exec process to manage each running job – each of these processes will have its own dmcl trace file.

When you have finished tracing the agentexec you will need to kill the command line process and restart the Content Server (if anyone knows how to force the content server to restart the agentexec after the failure limit has been reached I’d love to know).

Conclusion

With a clear understanding of how agent_exec works and with the trace output available it should be possible to troubleshoot and resolve just about any job scheduler related problem.

Bad Citizens of the (Documentum) World

December 14, 2007 at 4:31 pm | Posted in Troubleshooting | 4 Comments

One of the enduring problems with some Documentum application installers is that they can be very inconsiderate to other Documentum programs. Ever tried to install Content Server, Web Publisher and Documentum Application Builder on the same machine? Usually at least one app doesn’t work. This is a very typical scenario for a developer or consultant and is also a typical architecture for a small production Documentum installation. The ‘solution’ is to reinstall the app that isn’t working, but usually one of the other apps stop working (some solution).

The problem seems to be the way each application amends dctm.jar. dctm.jar contains a single manifest file that contains a list of other Documentum jar files that are needed by Documentum applications. Simply including dctm.jar on the class path serves to include all the jar files listed in the manifest. When the application installer for a product updates the dctm.jar manifest file to add it’s own application specific jar files the effect is often to remove the application specific files for another application. So you install DAB and Web Publisher stop working. You reinstall WebPublisher and DAB stops working.

Now one way to deal with this is to simply edit the manifest file to include the missing files. But I must admit I have found the manifest file to be very picky about how I update the file and in frustration I came up with another way that works well for DAB:

  • First install DAB and make a copy of dctm.jar, call it dctm2.jar
  • Now install the other applications
  • Create a batch file containing 1) a new class path definition to override the System classpath 2) Command line invocation of DAB

Here’s the contents of the batch file I use to start DAB (I locate it in C:\Program Files\Documentum\Application Builder):

set ClassPath=C:\Program Files\Documentum\dctm2.jar;C:\Documentum\config
bin\Dab.exe

Similarly Documentum Application Installer can be started using the following batch file located in C:\Program Files\Documentum\Application Installer:

set ClassPath=C:\Program Files\Documentum\dctm2.jar;C:\Documentum\config
"Application Installer.exe"

Momentum Monaco

November 8, 2007 at 3:58 pm | Posted in D6, Momentum, Performance, Troubleshooting, Xense Profiler | Leave a comment

I have been at the Momentum Conference in Monaco all week seeing all the exciting new stuff in D6 (and beyond) and I also presented on Performance Tuning in D6. I have uploaded the presentation here: Momentum 2007 Monaco – Inside EMC Documentum.

The presentation talks about the implications of removing the native code DMCL from DFC. There is no longer a dmcl trace facility which has been the cornerstone of performance tuning Documentum in the past. I suggest that the DFC can be used in a similar way. I then go on to discuss how a software tool could analyse the trace and I suggest there are 3 useful views of the performance data:

  • Top long running DFC calls
  • Top long running queries
  • Call profile

Update: Xense now has a beta version of a tool that does this, Xense Profiler for DFC.

I demonstrate 4 performance tuning example scenarios and show how the different views can be used to interpret the trace. Each of these example scenarios illustrate a performance problem pattern.

I should point out something: One of the things I say in the presentation is that there are not yet any performance tuning analysis scripts for D6 available from EMC. A subsequent discussion with Chase Harris revealed that there are some new scripts and they should be generally available. I’ll post some information about how to get hold of them when I find out.

Update: The scripts are here

Grouping Repeating Attributes

November 1, 2007 at 2:30 pm | Posted in Troubleshooting | 1 Comment

One of the useful features of DQL is its ability to group repeating attributes for a single object into a single row of a result set. For example the following query on my system just returns the repeating authors attribute:

1> select authors from dm_document where any 
r_version_label = 'flag1'
2> go
authors
------------------------------------------------
tom
dick
harry
tom
dick
harry
tom
dick
harry
(9 rows affected)

Listing of authors isn’t that interesting but if we add in r_object_id to the select column:

1> select r_object_id,authors from dm_document 
where any r_version_label = 'flag1' order by r_object_id
2> go
r_object_id       authors
----------------  ------------------------------------------------
0900022b80010513  tom
                  dick
                  harry
0900022b80010514  tom
                  dick
                  harry
0900022b80010515 tom
                  dick
                  harry
(3 rows affected)

This time there are only 3 “rows”, one for each of the objects found in the query, and each repeating attribute for an object is returned with all the other repeating attributes for that object.

Notice that I added in an “order by r_object_id”. This is essential to ensure that the grouping works as expected. This is a point that is not particularly clear in the documentation and it is somewhat limiting, as it prevents the user from ordering the objects in some other useful way like by object_name (or UPPER(object_name) as so beloved of most of Documentum’s user interfaces these days).

The implementation of repeating attribute grouping assumes that the rows will be returned from the database in r_object_id order. However it is easy to forget this and not even notice in development or testing, as many times the database will naturally return rows in r_object_id order. However with a little effort it is possible to produce a test case like the following:

1> select r_object_id,authors from dm_document 
   where any r_version_label = 'flag1'
2> go

r_object_id       authors
----------------  ------------------------------------------------
0900022b80010513  tom
                  dick
0900022b80010514  tom
                  dick
0900022b80010515  tom
                  dick
0900022b80010513  harry
0900022b80010514  harry
0900022b80010515  harry
(6 rows affected)

All I have done is remove the “order by” clause from the query. In this particular case we now have 6 rows with each object reported twice! I leave details of what is going on to the next post but I’ll leave you with a little puzzle. I can “fix” this particular query by applying some hints:

1> select r_object_id,authors from dm_document 
   where any r_version_label = 'flag1' 
   enable(oracle('use_nl(dm_repeating.WE_) use_nl(dm_document.VE_)'))
2> go

r_object_id       authors
----------------  ------------------------------------------------
0900022b80010513  tom
                  dick
                  harry
0900022b80010514  tom
                  dick
                  harry
0900022b80010515  tom
                  dick
                  harry
(3 rows affected)

Why does this work?

Note I’m not suggesting this as a reliable way to solve the problem (in general there isn’t one short of having r_object_id as the first column in the order by) but it gives a clue as to how Documentum implements this feature.

Content Server 6 Installation Problems

September 28, 2007 at 11:12 am | Posted in D6, Troubleshooting | Leave a comment

It seems that a few people (myself included) have had the inevitable problems with installing this initial release of D6. Many of the problems seem to relate to the packaging of weblogic as the new app server for the JMS (previously Tomcat).

It is useful to look for a few log files when trying to troubleshoot this problem. The logs for the weblogic install go in %DM_HOME%\logs\install (at least on a Windows install) and the file structure for a successful installation looks like this:

D6 Install weblogic logs 1

If you click on the weblogic folder you should get the following structure:

D6 Install weblogic logs 2

I would first of all check the file install.log in the weblogic folder. It should show some thing like:

STDOUT: Extracting 0%…………………………………………………………………………………………..100%

Clearly the first step the installer takes is to extract/unzip the weblogic installation package. The extraction appears to occur to the current working directory. When I tried invoking the installation from a network drive without write permission the installation proceeded happily until the weblogic installation. Then it just hung around for ages showing a progress bar that crept up to 100% and then went no further. Viewing the weblogic/install.log showed only:

STDOUT: Extracting 0%

which is what makes me think that the extraction was targetted at the non-writable network drive resulting in failure.

Assuming the installation gets this far but you still have problems then it is worth examining the other log files, probably in the date /time order you see on the directories here.

Content Server 6 Server Tracing

September 5, 2007 at 10:37 am | Posted in D6, Troubleshooting, Xense Profiler | 3 Comments

Loads of D6 documentation is appearing on Powerlink. I’m going through the Content Server Admin Guide over the next few days and I’ll pick out some of the stuff that looks interesting. The first thing that caught my eye was that Tracing and Logging has its own chapter. Previously this was all lumped into the Tools chapter. A very timely revision.

2 important changes here, first sqltrace is now on by default. This means that all the SQL being generated from a DQL command (very often there is a 1-to-many relationship between DQL and SQL) is output to the content server session log. I think this is a great improvement from the troubleshooting and performance tuning aspect.

Secondly there is a new server tracing option called rpctrace. The documentation states ‘Turns on tracing of RPC calls’. I will be having a look at the tracing format and the information being dumped by this trace option as again it is potentially very useful.

Running queries in IAPI

August 29, 2007 at 11:16 am | Posted in Troubleshooting | 3 Comments

Just noticed this post in the EMC Forums. The user wanted to know how to run a query in IAPI. Many of the answers talk about using execquery and readquery and so on which is all well and good, however there is a really nifty IAPI command that can run the query and display the results. If you read this blog regularly you will notice I use it a lot.

Using the ‘?’ command will run a query e.g.

?,c,select r_object_id,r_version_label from dm_document where object_name = 'template 1'

produces output like this:

r_object_id       r_version_label
----------------  --------------------------------
0900022b80003bfd  CURRENT
                  1.0
(1 row affected)

no need for next, get and close commands, it’s all done for you by IAPI.

The Art of Troubleshooting part deux

August 7, 2007 at 8:50 pm | Posted in Troubleshooting | Leave a comment

Jonathan Lewis blogged about another way of looking at troubleshooting. Whilst much of Jonathan’s blog is deeply technical he also has the ability to translate a rather dry subject into a colourful and memorable metaphor.

If you are in the mood for light-hearted reading then you could try another of Lewis’ more colourful posts. According to my wife if you find that funny you’re a geek. That’s a good thing in my book.

The Art of Troubleshooting

August 6, 2007 at 11:32 am | Posted in Troubleshooting | Leave a comment

Laurence has a neat trick to solve an LDAP Synchronization problem. I’ve used this a few times myself in the past. The really interesting thing about the post is it’s a good illustration of how to go about troubleshooting a problem. I suspect I went through a similar process when I discovered the trick myself.

It’s not just about the technical details but about the attitude you take in solving the problem. First of all you start of with a general understanding of how Documentum implements jobs, you know what is being called and when. As a result of reading round the problem (docs, forums, blogs, etc) you have an idea of what structures are possibly being updated. You understand in broad terms how the code is doing the job it is supposed to do. From all this you are usually left with a number of possibilities as to why you get the results you see. A little testing narrows it down to (hopefully) just one thing.

Finally it’s good to see someone discuss the issue of when and where you can test out theories about how the system works. Usually you need to do this in an isolated, controlled setup; to avoid messing up the production system for real users and also to ensure that tests are controlled and repeatable. But as Laurence points out removing a user wasn’t a big deal if that user couldn’t access the system in the first place, but you still need to be careful.

Next Page »

Blog at WordPress.com.
Entries and comments feeds.