A simple test

January 30, 2007 at 2:56 pm | Posted in Troubleshooting | Leave a comment

One of the first things I do after deploying content server on to a new system is the following quick and simple test:

1. Open up a command prompt or shell
2. Start iapi/iapi32 and run a script such as the following:


create,c,dm_document
set,c,l,object_name
quick test
link,c,l,/Temp
setcontent,c,l,crtext
some test content
save,c,l
getfile,c,l
query,c,select * from dm_document search document contains 'test'
next,c,q0
close,c,q0
quit

This test will quickly identify problems with the main components of documentum:

  1. Is is possible to even login?
  2. Can a document be created?
  3. Is both read and write access to the content store available?
  4. Is the full-text querying component properly instantiated?

I have seen several systems where everything appeared to be working well until a certain API (creating new content or running a full-text search) was invoked.

Error Patterns

January 27, 2007 at 2:19 pm | Posted in Development, Troubleshooting | Leave a comment

Consider these 3 seemingly unrelated problems:

  • I am getting [DM_API_E_NO_COLLECTION]error, should I increase max_collection_count?
  • My query runs but only returns 20 rows when I know there is over 100. Increasing batch_hint_size enables me to process all the rows, but what should I set it to to ensure this always works?
  • I raised a bug with Documentum support that LDAP Synchronization against AD only synced a maximum of 1000 users when we had 1500 users (this was back in the days of 5.2). Support provided a fix that synced a maximum of 2000 users. The error reappeared when we extended our testing to more than 2000 users.

All the proposed solutions probably weren’t the correct ones. In the first case the problem is most likely code that fails to close a collection. With java you should usually embed the close() call inside a finally block to ensure that close is called even after an exception. Other programming languages require similar protective measures. If your code really requires more than the default 10 collections you probably have an application design problem.

In the 2nd case the user is almost certainly running the query with DF_READ_QUERY/DF_EXECREAD_QUERY flag (readquery or execquery,c,T,… when using the API). The solution would be to run the code with the DF_EXEC_QUERY flag. batch_hint_size is a performance parameter, it should never be used to ‘fix’ functional problems.

The last one is a case of incorrect implementation of the Active Directory sync code. The Active Directory api uses a paging mechanism to optimise the transfer of query results across the network. The idea is to set the page to a suitable size and then loop on each fetch until there is no more data (The same idea as batch_hint_size in fact). The original simplistic implementation simply exited after the first batch.

What all these cases have in common is that the developer failed to correctly understand the programming APIs involved. Most APIs are more than a simple collection of function calls. They often have subtle nuances and a failure to appreciate and understand these nuances can lead to runtime bugs. It is often difficult to detect these bugs with testing as they will depend on specific scenarios not often met in development or testing environments. The most effective way to avoid these bugs is ensure that code is thoroughly reviewed by an appropriately experienced coder.

Upgrade performance

January 25, 2007 at 1:45 pm | Posted in Documentum and Oracle, Performance, Troubleshooting | 3 Comments

Today I was testing an upgrade from Content Server 5.2.5 sp3 to 5.2.5 sp5. All was going well until I started applying the upgrade script (headstart.ebs, etc). The process seemed to get stuck on the dd_populate script. After an hour the system was still grinding away. Time to start finding why it was taking so long.

Whilst the CPU on the machine was averaging a reasonable 20% (this was a single CPU machine), I could see from the disk lights that the disk drives were being driven very hard. As the major contributor to the 20% CPU was Oracle my initial thought was that Oracle was also responsible for the disk IO as well. So I opened Oracle Enterprise Manager, opened the Instance node and clicked on Sessions. Sorting by Physical IO and refreshing the display a couple of times quickly lead me to the problem session that was generating all the IO.

Double-clicking on the session and selecting the SQL tab showed me the SQL being run:

select count(*) from dm_relation_s where relation_name = 'DM_fk_dmr_contai_component_'

The execution plan involved a full tablescan of the dm_relation_s table. Since this system used a large number of dm_relation objects this table was quite large. Scanning it was taking several minutes at a time causing the excessive physical IO.

Still, I reasoned that this query should only take a couple of minutes rather than several hours. To make it easier to see what was going on I started sqlplus in a command prompt and executed the following query every 30 seconds:


select username, time_remaining, sql_hash_value,sql_text
from v$session_longops l,v$sql q
where l.sql_hash_value = q.hash_value
and time_remaining > 0;

The v$session_longops view contains details of long-running operations. This showed me that numerous similar queries were being run one after the other which explained why dd_populate was taking so long.

It’s pretty obvious that these queries would benefit from an index on dm_relation_s.relation_name. A quick check showed that the index did not exist so I created one in DQL:

execute make_index with type = ‘dm_relation’, attribute = ‘relation_name’
go

The MAKE_INDEX method doesn’t update the database statistics so it’s a good idea to do this now. The easiest way for just a single index is to use Oracle Enterprise Manager. Select the Schema, Open the indexes node and select the index. Right-click and select analyze. The dialog box will guide you through the process.

Once I had updated the statistics I stopped the dd_populate script and restarted it and this time it completed in minutes.

Where is my content stored?

January 24, 2007 at 10:25 am | Posted in Architecture, Troubleshooting | 5 Comments

There have been a number of threads on the Documentum Support Forum recently regarding content and where it is stored. I thought it would be useful to put all the ideas together in one place. I’ll only deal with the standard filestore setup here which covers the majority of systems.

Documentum stores your content files in one or more file storage areas, typically with a structure like:

<root of the storage area>/18/ff/0f/44.xls

Usually it is not necessary to know what this storage path is. Standard DMCL and DFC apis allow you to retrieve and store content for an object without having to know the exact location of the storage. However, especially for troubleshooting purposes, it is useful to be able to see where content has been stored.

To retrieve the storage location given the r_object_id of a dm_sysobject use the getpath API method. For example in IAPI:

API> getpath,c,090000038006a902
...
C:\Documentum\data\dm3\content_storage_01\
0000003\80\2f\8e\22.txt

If you already have the dmr_content object id then you can use the GET_PATH method in dql:

1> execute GET_PATH for '0600000380056500'
2> go
result
C:\Documentum\data\dm3\content_storage_01\
0000003\80\2f\8e\22.txt

Finally if you really want to locate a dmr_content object based on the path to a particular piece of content object (reverse lookup) then you can follow this rather convoluted procedure:

1) Turn the filepath into an 8 digit Hex data ticket (i.e. using the file path after the root of the storage path), so for the example above this would be 802f8e22
2) Start Windows calculator (or any other hex to decimal converter)
3) Subtract 0x80000000 from your hex data ticket.
4) Now subtract the result from 0x80000000
3) Convert to decimal and put a minus sign in front.

This gives you the data_ticket value and you can now find the dmr_content object using the DQL:

select r_object_id from dmr_content where data_ticket = <value>

The reason for this is that Documentum turns the hex number (802f8e22) into a signed 32-bit integer. All numbers over 0x80000000 turn into negative numbers, which is why all data_tickets are appear as negative decimal values.

References
Content Server Administration Guide, Path Specifications for Content in File Stores
Content Server API Reference
Content Server DQL Reference, Administrative Methods
Posting on documentum-users, from content storage area to object id – reverse lookup

Dealing with yacc stack overflow

January 23, 2007 at 9:41 pm | Posted in Development | 6 Comments

The ‘yacc stack overflow’ error usually occurs when sections of a DQL query contain too many tokens. The most typical example is something like the following:

select * from dm_document where object_name in ('first token','second token',...,'500th token')

If the number of items in the in list exceeds about 490 items a yacc stack overflow occurs. The solution is to split up the long list of tokens into a number of separate constructs. In the case of the example above:

select * from dm_document where object_name in ('first token','second token',...,'485th token') or
object_name in ('486th token',...,'500th token')

DF_READ_QUERY/readquery and the 255 characters limit

January 22, 2007 at 11:22 pm | Posted in Performance | Leave a comment

Documentum, like many other software systems, has many myths. One of them is that queries using DF_READ_QUERY flag (or the API readquery) can’t be used for queries longer than 255 characters.

Like most myths there is a grain of truth. The 255 character limit only occurs when the communication protocol used between the client and the Content Server is DDE (Dynamic Data Exchange). DDE is an ancient protocol and no system I know off still uses it.

So really there is no limit – try it with the following carefully crafted query:

select * from dm_document where object_name = 'one potato' union select * from dm_document where object_name = 'two potato' union select * from dm_document where object_name = 'three potato' union select * from dm_document where object_name = 'four potato' union select * from dm_document where object_name = 'five potato' union select * from dm_document where object_name like '%test%'

Run this in DFC using the DF_READ_QUERY flag or in IDQL – IDQL uses readquery API under the covers!

Performance Tip: Use Function-based indexes

January 19, 2007 at 8:29 pm | Posted in Documentum and Oracle, Performance | Leave a comment

Why does a query like

select * from dm_sysobject where upper(object_name) like 'A%'

seem to take longer and longer as the number of objects increase whereas

select * from dm_sysobject where object_name like 'A%'

continues to perform well?

The answer is probably that the database execution plan for the 1st query uses a tablescan to access the table dm_sysobject_s. The 2nd query will probably use an index on dm_sysobject_s.object_name. If a function (in this case UPPER) is applied to the column in the predicate (where clause) it is not possible to use the index on the column.

If you are using Oracle there is a very useful feature called ‘Function-based indexes’ that allows Oracle to use an index even when a function has been applied to the column. You have to create the index at the database level rather than using the Documentum MAKE_INDEX API. So in this case you would do the following:

  1. Open SQLPlus and login as the docbase owner
  2. create index fn_idx_upper_object_name on dm_sysobject_s(upper(object_name));

This command creates a function-based index called fn_idx_upper_object_name.

Once you have done this it is necessary to ensure the database statistics include the new index. Do this by running the Documentum admin job dm_UpdateStats; alternatively to update the stats for just this index use the following command in SQLPlus:

execute dbms_stats.gather_index_stats(ownname=>USER,indname=>'FN_IDX_UPPER_OBJECT_NAME');

You should note that when this feature was first introduced sometime in Oracle 8 it was necessary to set some init.ora parameters QUERY_REWRITE_ENABLED and QUERY_REWRITE_INTEGRITY. At some point this requirement was dropped and certainly it is not necessary when using function-based indexes in Oracle 9.2 or 10. This is despite the fact that the Oracle Performance Tuning Guide for 9.2 still says it is necessary!

What type of query does IDQL(32) use?

January 18, 2007 at 11:32 am | Posted in Performance | Leave a comment

At the API/DMCL level there are a number of different commands that run a query:

  • query
  • execquery
  • readquery
  • cachequery
  • query_cmd

So which one(s) does IDQL/IDQL32 use?

The answer is readquery. readquery is used for select statements as well as update, delete, insert and create DQLs!

Don’t be confused by the name. readquery can be used with queries that update the state of the repository.

Object Replication Performance

January 8, 2007 at 5:48 pm | Posted in Object Replication, Performance | 3 Comments

In a recent post on the Documentum Support Forums I alluded to some testing I had done on Documentum Object Replication. I give the test results here to demonstrate some important design points when deciding how (or if) to deploy object replication.

The test involved timing a replication run from start to finish (source and target) using different sizes of data set. In each run no changes had been made to the replicated data so you might expect that:

  1. Each replication run is quite quick
  2. The time to replicate is related to the number of changes made to objects under the replication folder

The replication used the following options:

  • Full Refresh – false
  • Fast Replication – false

Here are the results:

Objects       Duration
-------       --------
  1,000           64s
 10,000          344s
 50,000         2032s
100,000         4213s

As you can see it takes a substantial amount of time to replicate no changes! In fact the time the replication takes is proportional to the number of objects under the replication folder (nb. Don’t take these figures as some sort of guideline for how long a replication will take as this figure will depend crucially on the number of relationships between objects and the speed of the processor).

The problem is related to the way Documentum’s dump API tries to check each object and its related objects to see if they have changed; even with a very simple dataset with no user-created relationships there are usually 6 or 7 queries to run for each object and this is a substantial amout of processing when aggregated over a large number of objects to check. For more complicated datasets with multiple relationships between objects the processing times will increase markedly.

If you think this type of behaviour is unsatisfactory for your particular requirement there are a couple of things you can look at.

  1. Rather than use a single replication folder, try to split up the data across multiple folder and create a replication job for each one. Since each replication job is a single-threaded process multiple CPUs do not benefit a single replication job however multiple Replication Jobs can take advantage of multi-processor machines
  2. Consider using the Fast Replication option. Fast Replication reduces the amount of checking for Related Objects and, particularly for datasets with multiple relationships between objects, can be considerably faster for no-change replication runs. Make sure you test thoroughly though particularly if you are relying on changes to Related Objects being automatically replicated

DF_EXEC_QUERY and Documentum Sessions

January 4, 2007 at 12:13 pm | Posted in Architecture, Documentum and Oracle | 2 Comments

I recently posted an article explaining how Content Server sessions work and the role of the client_session_timeout parameter (Understanding Documentum Sessions). In order to concentrate on the concepts that article dealt with a rather simplistic scenario, so I’d like to extend that discussion to look at how things change when you use the DF_EXEC_QUERY option when running a query.

Documentum has 2 basic query modes, a read-only mode and an exec mode. You can set these modes at the DFC level by including the DF_READ_QUERY or DF_EXEC_QUERY flags respectively. At the DMCL level this translates to one of the query apis like readquery, execquery or query_cmd. Some of these DMCL query APIs have a boolean parameter that allows selection of the mode but others like readquery only operate in one mode.

When you use read-only mode session behaviour is exactly the same as discussed in the article – the Content Server session thread runs the query using the established database session. However when you use exec mode Content Server will start a new database session to run the query. For the duration of the query there will be 2 database sessions associated with the Content Server session. In fact if you run additional queries at the same time in exec mode you get a new database session each time, so a single Content Server session could easily create 1+max_collection_count database sessions!

2 final points if you are using Oracle. First database sessions are an important resource that should carefully managed – make sure that you are using DF_EXEC_QUERY only when you need to. Second when you are trying to configure Oracle you may need to set the PROCESS parameter higher than the number of concurrent users as some of those users may be creating additional database sessions.

Blog at WordPress.com.
Entries and comments feeds.