Tags: DFCprof, performance, xense profiler
I’m really excited to announce that from today Xense Profiler for DFC – the Documentum 6.x performance profiling tool – is now an open source project (DFCprof).
The DFCprof project is hosted on Sourceforge where you can download fully-functional binaries or build your own copy from the source code. The software is totally free – that’s free as in ‘free beer’ as well as ‘free speech’!
DFCprof can be used in a number of different ways. Most people will be interested in using it as a standalone application to process a DFC trace file and create a performance analysis report.
Just download the application from sourceforge, extract the files and you are ready to go.
Alternatively you can embed the dfcprof-x.x.x.jar library into your java project and use the trace parsing facility from there. I’ll be posting more details on the DFCprof parser API in due course. I’ll also be talking about the roadmap for future DFCprof features. Feel free to drop me a line in the comments if there are particular things you would like the project to do.
Tags: centerstage, documentum, ECM, ECM vision, EMC World, xCP
Like Lee I wasn’t able to get to EMC World. Interestingly however I did experience much of it through twitter. Of course I didn’t get the first class, you-had-to-be-there type of experience but it was a significant experience nonetheless. Many people were tweeting during sessions and bloggers were putting up summaries of sessions almost immediately afterwards. What this meant was that not only did the facts come through but also some of the emotional reaction to announcements as well.
ECM vision required
I’ve watched (most of) the Mark Lewis keynote and I’ve read most of the blog summaries of the keynotes and other sessions. I have certainly been left with the following impressions:
- EMC appears to be retreating from core content management as a selling point
- As a corollary of the first point CenterStage is not getting the resources or attention it could
- Case Management seems to have become an over-riding priority
That’s the impression – it may not be what Mark Lewis intended but that is certainly what comes across. Given the above it is hardly surprising that EMC don’t have a particularly inspiring Enterprise Content Management vision.
So what should/could an Enterprise Content Management vision look like. First off I don’t like the idea of buying a Content Management platform so the vision has to be more than ‘you have lots of information to manage so buy our software to solve your problems’. It certainly seems that core content management functionality has been commoditised so that you can get content metadata, versioning, renditions, full-text and metadata querying and basic workflow from anywhere.
But content management functionality is not Enterprise Content Management. ECM needs arise when an organisation scales (in terms of people, numbers of teams or document volumes) such that additional problems or obstacles arise. Some of these problems are stuff like archiving or large-scale ingestion. It’s easy to see why these types of problems fit well for EMC as a primarily hardware company.
Other problems seem to require more finesse. They would include things like:
- discoverability – getting the right information to the right people
- rich content – going beyond mere content and metadata
- analytics – mining the information for enhanced value
- Building knowledge communities – to turn data and information into knowledge
- Incentives – providing some way of encouraging people to go to the trouble of making content available e.g. by tagging, writing blogs, contributing to Wikis and so on.
I would like to see EMC come out with something that shows how EMC might be the solution. That won’t solve all of these right now but I’d like to know, 3-5 years down the line, what their software might enable us to do.
One product that should be clearly at the centre (sic) of this strategy is CenterStage. For some reason this product seems to have lost management focus. It seems to have taken ages to get a GA release shipped and we are still waiting for some features that really should be there. However I think EMC should be proud of the type of product that is embodied in CenterStage and should be looking to push this as a major ECM product. I think it is much more than a simple Sharepoint competitor although that is how the marketing comes across.
One of the features of CenterStage that is not well sold is facets and in particular facets generated from analytical processing of content and comments. A facet is essentially a drill-down capability that allows the user to narrow down the results of a search. Obvious examples are the format of the document or the content size. This type of drill-down – based on author-supplied intrinsic metadata collected by any self-respecting content management system – seems so obvious you wonder why this type of feature hasn’t been standard in Content Management search for years.
However 3 other facets are available with CenterStage:
These facets are not based on metadata recorded by content authors, they are generated from a textual analysis performed on each piece of content by Content Intelligence Services (which utilises Temis Luxid as the text analysis engine). Since discoverability – getting the right information to the right people – is one of the key issues/problems in effective information management, enhancing content in this way is important.
This kind of content enrichment is not something that is provided out of the box by Sharepoint. This really never came across in any presentations I have seen and I only really got this after downloading and playing around with CenterStage. Of course it needs some further development to really make this feature great but I can’t understand why EMC aren’t shouting this from the roof-tops.
xCP and Case Management
I really want to believe that EMC don’t think that ECM and Case Management are one and the same. My initial impression from Momentum Athens (Nov 2009) was that xCP was a way of developing EMC content-based application using more configuration and less coding. Case Management was simply the first application area to get the xCP treatment.
I liked the implementation of ‘configure not code’ and it also appeared that a lot of effort and thought had gone into how to market this idea. It’s clear that a lot of resource has gone into Case Management, possibly at some expense to CenterStage, but I’d like to think that the xCP treatment will be passed on to CenterStage and other applications. I’d like EMC to show me this vision rather for me to assume all of this.
Tags: Composer, Continuous Integration
Ever since I got back from Momentum it’s been work, work, work. That’s what happens when you take 4 days off to look around at what’s going on. I recall that I was going to post some more thoughts on some of the other products that I saw.
I went to David Louie’s presentation on Composer. Have to say I was impressed with what I saw. This maybe because I’ve been developing with Eclipse for a while now, so having something that integrates natively with this environment is a big plus. Whilst there are many interesting functional features of Composer I was most interested in a single slide that compared Composer with Application Builder.
First Composer doesn’t require a connection to the docbase to get your work done. You can of course import objects from a docbase, but you can also import from a docapp archive.
Secondly Composer can install your application (a DAR, similar to a DocApp in concept) into a docbase via a GUI installer but you can also use something called Headless Composer which is a GUI-less installer that runs from the command line. Not absolutely sure on the specifics at this point but possibly uses ant. David said that there are details in the documentation – I will be sure to try it out and post my findings at a later date.
This last point was of great interest to me as I’m currently investigating how to run Documentum development using a continuous integration approach. Being able to deploy your artifacts from the command line, and therefore from some overall automated controlling process is essential to making continuous integration a reality. On this note I also spoke to Erin Samuels (Sharepoint Product Manager) and Jenny Dornoy (Director, Customer Deployments). I hope that the sharepoint web parts SDK that is likely to integrate into MS Visual Studio will also have support for a headless installer, and also that Documentum/EMC products generally support the continuous integration approach.
Tags: centerstage, D6, momentum 2008
Of course the star of the show was Centrestage. If you don’t know what Centrestage is (where have you been?), in a single sentence, it’ s the next generation of Documentum client providing Web 2.0 features, a significantly different customisation model (compared with WDK) and no-cost/low-cost licencing model.
I won’t go into too much detail about the features except to say they include basic content services, personal spaces, team spaces, blogs, wikis, rss, tagging and faceted search. The time line was set as 1.0 to be released April 2009 (the beta version is available on the download site), 1.5 to be released after that and then a D7 version released by the end of 2009.
The UI is composed from numerous separate components which, in concept at least, are like Sharepoint WebParts. Since each component needs to be rendered on the page separately I wondered whether this would mean that a page with, say, 20 components would need 20 separate network calls to display the page. In a high-latency network environment this could be a performance nightmare. Apparently the DWR library allows for batching of requests – it means that having numerous components on the page could be displayed using a smaller number of network requests.
Tags: Advanced Site Caching Services, Momentum, XML Store
On Tuesday and Wednesday I attended a load more sessions covering XML Store, Centrestage, Composer, Sharepoint and Web Content Management. In the next few posts I’ll share some of my thoughts and impressions, starting with XML Store.
For those that don’t know, EMC purchased a company called X-hive a while back. X-hive have an XML database product and that has now been integrated into the full Content Server stack. The easiest way to picture this is to take the old picture of the repository as consisting of a relational database and a file system and add in a third element, the XML Store.
From 6.5 (possibly sp1, I don’t remember) all XML is stored in the XML store. The XML Store is built around the many XML standards that are in existence such as XQuery, XSL and the XML full-text query standard.
The XML is not stored in the usual textual XML format but in a DOM format. This presumably is to allow them to implement various types of index and to optimise the query access patterns. The performance claims for the database are impressive although they need to be taken with a pinch of salt. As with all benchmarking, vendors will target specific goals in the benchmark. However your real-life workloads could be very different. If you are expecting high-throughput for an application using the XML store I suggest you put some work into designing and executing your own benchmarks.
In addition to indexes there is also a caching facility. This was only talked about at a high-level, however just as relational database performance experts made a career in 1990s out of sizing the buffer cache properly so we may see something similar with XML database installations. We may see them suffering poor performance as a result of under-sized hardware and mis-configuration. As always don’t expect this to just work without a little effort and research.
One other point I should make is that the XML Store is not limited to the integrated Content Server implementation. You can also install instances of XML Store separately. For example the forthcoming Advanced Site Caching Servicees product provides for a WebXML target. This is essentially an XML Store database installed alongside the traditional file system target that you currently get with SCS. You can then use the published XML to drive all sorts of clever dynamic and interactive web sites.
Tags: documentum 6, momentum 2008, sharepoint
All this week I am at Momentum in Prague. It’s a great opportunity to catch up with Documentum employees, partners and users, and also to see what is going in the Documentum world.
I arrived yesterday morning, and attended the Sharepoint Integration product advisory forum. The forum was run by Erin Samuels and Andrew Chapman. The session centred around a number of topics relating to Sharepoint-Documentum integration.
First of all there was a round-table on the kind of integration scenarios people were facing. Interestingly, and reassuringly, there seem to be far fewer ‘maverick’ implementations as Andrew called them. Maverick implementations are where sharepoint is installed as a generic application that can be just configured and used by any department and team without any kind of guidance or direction from IT. This leads to silos of information and lack of control of any kind over the information. Whilst departments like this quick and easy delivery of applications it stores up problems for the organisation as it is no longer able to utilise or manage enterprise-wide data.
Andrew then talked about a new product that is due to come out called Journalling. Whilst I don’t think the naming is great (maybe that’s not how it is going to be sold but it was certainly the name used for the technology) the principle was very powerful. It uses the Microsoft-provided Sharepoint EBS interface to allow you to redirect where sharepoint stores its data. By default sharepoint will store content and metadata in a SQL server database. Each sharepoint instance will require a sql server instance (apparently) and this can easily become a big data management problem. Furthermore as sql server stores all content as BLOBs (Binary Large OBjects) there can be scalability issues.
With Documentum EBS implementation, content is (transparently to the user) stored in a Documentum repository rather than SQL server (there is just a ‘journal’ entry in sharepoint representing the object). This provides all kinds of useful benefits such as being able to leverage Documentum’s data storage scalability, EMC hierachical storage management, de-deduplication and so on.
At this point there was a big discussion around a point introduced by Andrew. Since the data is now stored in Documentum we can access it via Documentum clients; for example you average user might be creating content in sharepoint across the organisation, but you have power users who need the full power of Documentum interfaces to work with the data. But what operations should documentum clients be allowed on sharepoint originated data? Read or other types of operation that don’t modify the content/metadata are fine, but should we allow update or delete access? If yes then there is additional work required as right now an update outside of sharepoint would cause sharepoint to throw an error the next time a user accesses the object. Predictably there was an almost equal 3-way split over who wanted no Documentum access, read-only/no-modify access and total control.
Later on I got to meet up with some people that I only know from the Documentum forums and blogs: Johnny Gee, Erin Riley and Jorg Kraus. It was great to finally get to speak to these guys after years of interacting over the web.
Tags: documentum, performance, profiler, xense, xense profiler
An updated set of release notes have now been released to the website. In amongst a number of smaller cosmetic changes are some work I have been doing on making the profiler less sensitive to the specific tracing properties used to generate the trace.
One of the great virtues of the DMCL trace was the simplicity of invocation. There are only 2 different properties trace_level and trace_name with the latter being optional and defaulting to api.log. These could be set specifically in the dmcl.ini file or specified using and api call (e.g. trace,c,10,mytrace.log would start a level 10 trace to mytrace.log).
However the DFC trace has a lot of different options many of which have very significant implications for the formatting and content of the trace file. Suffice to say this makes reliable parsing of the dfc trace a challenge. To manage this complexity we took the decision to mandate the setting of certain parameters when generating the trace. Whilst this makes the development task easier it is not very user-centric. There is a risk that users will generate traces ‘incorrectly’ and then find that the profiler either doesn’t produce any results or even worse produces erroneous results.
The latest version, Xense Profiler for DFC v1.1, removes the requirement to have the dfc.tracing.include_rpcs flag set correctly; the profiler will correctly process files whatever the setting of include_rpcs used to generate the trace. There still some restrictions on the trace properties that can be used to generate the trace file – the details are in the release notes.
Future versions will aim to further remove these restrictions, in addition we will probably include a catalogue of trace formats that we can test against and at least warn the user if an incorrect trace format is being used.
Tags: documentum, performance, profiler, xense, xense profiler
The first beta for Xense Profiler for DFC finished 31st July but we have decided to extend the program for another 3 months. If you are an existing beta group member you should have received an email containing a new licence key and a link to the updated beta (if you didn’t get the email let us know). If you haven’t already signed up for the beta then you can do so here.
When we sent out the renewal emails to the existing beta group members we also asked for some feedback. I’ll repeat the questions here as we would really like to here from anyone who has used (or tried to use) the software; any feedback helps us improve the product:
- Was it easy to download and install the software?
- Was the license process easy to follow and complete? How could it be improved?
- Did the software work first time or was there configuration that you had to perform?
- Was the invocation of the software and the command line syntax intuitive and easy to understand?
- DFC tracing has a confusing array of options and the intial beta required certain tracing parameters to be set. Was this clear or did this cause you problems? Did you resolve any problems or did you leave it as ‘requiring too much of my time’?
- Did the HTML reports display adequately on your browser? If not what browser and os were you using?
- Was the meaning and interpretation of the various reports clear? Would it be useful if the documentation could cover the theory and approach to tuning? Would motivational examples help your tuning work and the use of the software?
- The Xense Profiler for Documentum 5 systems includes a CSV output facility that allows the raw trace data to be converted to CSV for import into (for example) Excel. Would such a feature be useful in the Xense Profiler for DFC?
- If we could change or improve one thing about Xense Profiler for DFC what would it be?
Tags: documentum, oracle, rac
It’s a tricky business keeping Release Notes up-to-date. I’ve just been browsing through the latest D6 SP1 Release Notes and my attention was caught by a small section about installing on Oracle RAC:
Installing with Oracle Real Application Clusters (121442)
If you are installing Content Server with Oracle Real Application Clusters (RAC), set
the value of the Oracle parameter MAX_COMMIT_PROPAGATION_DELAY to 0 (zero).
This value is required to ensure that the data that Content Server uses is consistent across
all Oracle nodes. Values other than zero are not supported.
I presume this has been in the various Content Server release notes for a while and it would have been important as using the default (or any other value here) uses a commit scheme that can delay other Oracle nodes from seeing changes. Since a single Content Server session often uses more than one database session (if you use DF_EXEC_QUERY in a DFC query call you are asking Content Server to start a new Oracle session) and those sessions could be attached to 2 different Oracle RAC nodes the delay in seeing recently changed values could cause havoc.
Now I know what your thinking, since we would obviously prefer to have data available immediately why would Oracle not use 0 as the default and why wouldn’t everyone just set it to 0 anyway? The answer of course is that there is a cost to be paid in performance; having to issue network calls to propagate information about a commit could be very costly (certain platforms seemed to have real problems with it), so in many cases the the default setting was fine provided your Oracle application didn’t have functional problems.
However since Oracle 10g Release 2 this parameter is deprecated – Oracle now uses the ‘broadcast on commit’ behaviour implied by MAX_COMMIT_PROPAGATION_DELAY=0 automatically. The best overview of this change is given in this paper by Dan Norris. Since the Content Server in D6 is only certified for Oracle on 10g Release 2 the entry shown above in the Release Notes is no longer needed. In fact it you could argue it is positively harmful. As they say, forewarned is forearmed.
By the way I stumbled on this bit of information whilst perusing the ‘Support by Product’ section on Powerlink. It is currently under beta and contains amongst other things a section for Documentum Content Server. It’s basically a portal type view of Content Server support information, bringing together support notes, whitepapers, documentation (there’s a very nice Top manuals section) and so on. I think it’s a brilliant idea and I urge everyone to have a look and provide feedback to the support team.
Always interesting to look at the search terms people are using when they reach my blog. One I noticed this morning is ‘documentum dfc 6.0 install dmcl.ini’. Looks like someone is installing D6 and wants to know where the dmcl.ini is.
The DMCL and the dmcl.ini has been part of Documentum since I started working with it 9 years ago, but D6 breaks all that. For the record there is no dmcl.ini in D6 – all the parameters that used to be in dmcl.ini now have equivalents in the dfc.properties.