Tags: DFCprof, performance, xense profiler
I’m really excited to announce that from today Xense Profiler for DFC – the Documentum 6.x performance profiling tool – is now an open source project (DFCprof).
The DFCprof project is hosted on Sourceforge where you can download fully-functional binaries or build your own copy from the source code. The software is totally free – that’s free as in ‘free beer’ as well as ‘free speech’!
DFCprof can be used in a number of different ways. Most people will be interested in using it as a standalone application to process a DFC trace file and create a performance analysis report.
Just download the application from sourceforge, extract the files and you are ready to go.
Alternatively you can embed the dfcprof-x.x.x.jar library into your java project and use the trace parsing facility from there. I’ll be posting more details on the DFCprof parser API in due course. I’ll also be talking about the roadmap for future DFCprof features. Feel free to drop me a line in the comments if there are particular things you would like the project to do.
Tags: content server, documentum, install, performance
Over the years I have, like many of you, spent quite a lot of time installing Documentum Content Server and creating a repository. Quite a lot of that ‘installation’ time is spent waiting for various scripts to run.
In fact I’m creating a new 6.6 repository right now. It’s taking quite a while to run (the documentation claims it could be 50% slower than a 5.3 installation). So naturally when making coffee, checking email and making a few important calls is done my mind turns to how it could be faster. Here are my thoughts.
The install/repository creation splits into 2 phases, first you install the content server software (if this is the first repository you are creating on the box) and then you create 1 or more repositories. Usually the install is fairly quick as it’s a case of specifying a few parameters and then the install copies some files and possibly starts some services.
The repository creation is where the time seems to get longer and longer. Again the process seems to split into the following:
1. Define a few parameters, user accounts and passwords
2. Create the database schema and start up relevant services
3. Run the Headstart and other scripts including the DAR installations
Again it’s this 3rd step that really takes the time. In essence by the time you’ve finished step 2 you have a working repository. It just doesn’t have a schema or in fact any type of support for things you will need to do via DFC, DFS or any of the other client APIs. The scripts in step 3 just use the API or DFC to create all of that starting with the basic schema and format objects right up to creating all the necessary application objects via the DARs. Since more and more functionality is being packed into a basic content server this is the part that just keeps getting longer and longer to complete – the number of scripts that are now run compared to say 4i is just one indication of that.
I wonder if EMC have considered a better way of doing things. All those API/DFC calls being made in the step 3 scripts simply result in Database rows being created or updated and occasionally files being written to the content store. I would suggest that overwhelmingly the same row values are being created and a quicker approach would be to pre-create a standard set of database files via much faster database-specific data input facilities (how about using imp for oracle databases) and then changing any of the values that need to be specific to the installation (this is somewhat similar in principle to how I believe sysprep works for creating multiple Microsoft Windows installations)
To do this sucessfully EMC would need to have some sort of tool that could quickly identify differences in the basic database from version to version, but it could mean a large number of customer/partner/EMC consulting hours saved. If EMC every change the database backend to xDB maybe this would be naturally easier – they already have XML differencing tools. Just a thought.
Tags: documentum, DQL tuning, performance, query tuning
What does this query do?
SELECT ALL 'y' AS notnavigatable,o.r_object_type AS r_object_type, o.r_lock_owner AS r_lock_owner,o.r_object_id AS r_object_id,'y' AS selectable,'0' AS isfolder,upper(o.object_name) AS objname, o.owner_name AS owner_name,o.r_version_label AS r_version_label, o.i_is_reference AS i_is_reference,o.a_content_type AS a_content_type, o.object_name AS object_name,100 AS idunion,o.r_is_virtual_doc AS r_is_virtual_doc,'' AS navigatable,o.r_link_cnt AS r_link_cnt, o.r_full_content_size AS r_content_size FROM dm_process o WHERE ( r_definition_state = 2) AND NOT ((FOLDER('/Resources', DESCEND) OR FOLDER('/System')) OR FOLDER('/System/DistributionList Templates', DESCEND) AND o.object_name LIKE 'dmSendTo%') ORDER BY 13 ASC,7 ASC, 4 ASC,9 ASC
I don’t really have an idea either until I format it into something readable:
SELECT ALL 'y' AS notnavigatable, o.r_object_type AS r_object_type, o.r_lock_owner AS r_lock_owner, o.r_object_id AS r_object_id, 'y' AS selectable,'0' AS isfolder, upper(o.object_name) AS objname, o.owner_name AS owner_name, o.r_version_label AS r_version_label, o.i_is_reference AS i_is_reference, o.a_content_type AS a_content_type, o.object_name AS object_name, 100 AS idunion, o.r_is_virtual_doc AS r_is_virtual_doc, '' AS navigatable, o.r_link_cnt AS r_link_cnt, o.r_full_content_size AS r_content_size FROM dm_process o WHERE ( r_definition_state = 2) AND NOT ( ( FOLDER('/Resources', DESCEND) OR FOLDER('/System') ) OR FOLDER('/System/DistributionList Templates', DESCEND) AND o.object_name LIKE 'dmSendTo%' ) ORDER BY 13 ASC,7 ASC,4 ASC,9 ASC
It is now immediately apparent which object types or registered tables are being queried (dm_process in this case), which attributes are in the select list and what the query predicates are. I generally use tabs to space out each block (e.g. the attributes in the select list are all aligned to a tab) and to show nesting of predicates and sub-queries.
Another benefit of formatting the query is that, generally, you are not that interested in the attribute select list when tuning queries. As a rule you look first at the object types/tables and predicates involved, along with any additional clauses that affect how queries will be processed like ORDER BY and GROUP BY. By formatting the query you can immediately spot the important FROM, WHERE, ORDER BY and GROUP BY clauses.
Here’s another example of a properly formatted query with multiple object types to illustrate another point:
select p.retainer_root_id as retainer_root_id, t.r_object_id as retained_object_id, p.event_date as event_date, p.r_object_id, p.object_name as retention_name, p.r_policy_id, p.retention_policy_id, p.r_retention_status, p.entry_date, p.qualification_date, p.r_object_type, p.phase_name, q.object_name as policy_name, s.object_name as retention_policy_name from dm_sysobject (all) t, dmc_rps_retainer p, dm_policy q, dmc_rps_retention_policy s where q.r_object_id=p.r_policy_id and s.r_object_id=p.retention_policy_id and p.r_object_id in ( select i_retainer_id from dm_sysobject (all) where (r_object_id = '0900379780155e1b') ) and t.r_object_id ='0900379780155e1b' and p.retainer_root_id is not null
Where there are multiple types or tables in the FROM clause I put them, one line for each, one after the other. Then in the predicate (where) clause I ensure that the join conditions come first followed by the other, filtering, clauses.
You can immediately see in this example that dmc_rps_retainer, dm_policy and dmc_rps_retention_policy all have join conditions but dm_sysobject (all) does not. This should raise alarm bells as if there is no join condition for a table then all the rows of the table will be joined to the rowset formed by the other table joins – this could potentially result in massive amounts of work for the database engine.
In this case there is no need to worry as we can see further down that there is a filtering condition t.r_object_id =’0900379780155e1b’ which will ensure that only a single row will be output for joining to the other tables.
It is impossible to effectively tune queries without understanding what they do and it is very difficult to understand what a query does without formatting it. With only a little practice it is easy to format even very large queries in a few minutes.
Tags: documentum, performance, profiler, xense, xense profiler
An updated set of release notes have now been released to the website. In amongst a number of smaller cosmetic changes are some work I have been doing on making the profiler less sensitive to the specific tracing properties used to generate the trace.
One of the great virtues of the DMCL trace was the simplicity of invocation. There are only 2 different properties trace_level and trace_name with the latter being optional and defaulting to api.log. These could be set specifically in the dmcl.ini file or specified using and api call (e.g. trace,c,10,mytrace.log would start a level 10 trace to mytrace.log).
However the DFC trace has a lot of different options many of which have very significant implications for the formatting and content of the trace file. Suffice to say this makes reliable parsing of the dfc trace a challenge. To manage this complexity we took the decision to mandate the setting of certain parameters when generating the trace. Whilst this makes the development task easier it is not very user-centric. There is a risk that users will generate traces ‘incorrectly’ and then find that the profiler either doesn’t produce any results or even worse produces erroneous results.
The latest version, Xense Profiler for DFC v1.1, removes the requirement to have the dfc.tracing.include_rpcs flag set correctly; the profiler will correctly process files whatever the setting of include_rpcs used to generate the trace. There still some restrictions on the trace properties that can be used to generate the trace file – the details are in the release notes.
Future versions will aim to further remove these restrictions, in addition we will probably include a catalogue of trace formats that we can test against and at least warn the user if an incorrect trace format is being used.
Tags: documentum, performance, profiler, xense, xense profiler
The first beta for Xense Profiler for DFC finished 31st July but we have decided to extend the program for another 3 months. If you are an existing beta group member you should have received an email containing a new licence key and a link to the updated beta (if you didn’t get the email let us know). If you haven’t already signed up for the beta then you can do so here.
When we sent out the renewal emails to the existing beta group members we also asked for some feedback. I’ll repeat the questions here as we would really like to here from anyone who has used (or tried to use) the software; any feedback helps us improve the product:
- Was it easy to download and install the software?
- Was the license process easy to follow and complete? How could it be improved?
- Did the software work first time or was there configuration that you had to perform?
- Was the invocation of the software and the command line syntax intuitive and easy to understand?
- DFC tracing has a confusing array of options and the intial beta required certain tracing parameters to be set. Was this clear or did this cause you problems? Did you resolve any problems or did you leave it as ‘requiring too much of my time’?
- Did the HTML reports display adequately on your browser? If not what browser and os were you using?
- Was the meaning and interpretation of the various reports clear? Would it be useful if the documentation could cover the theory and approach to tuning? Would motivational examples help your tuning work and the use of the software?
- The Xense Profiler for Documentum 5 systems includes a CSV output facility that allows the raw trace data to be converted to CSV for import into (for example) Excel. Would such a feature be useful in the Xense Profiler for DFC?
- If we could change or improve one thing about Xense Profiler for DFC what would it be?