Documentum and Greenplum

January 14, 2013 at 8:30 am | Posted in Big Data | 1 Comment
Tags: , , ,

@Mikemasseydavis tweeted “will we see #documentum and #greenplum become a ‘platform'”. This aphorism obviously had some attraction since myself and 2 others retweeted it. In a way this is not a completely new idea as Generalli Hellas backed the notion of ‘xCP as the action engine for Big Data‘ which was one of the big ideas that came out of Momentum 2011. In fact EMC seem to have big ideas in this area as evidenced here.

I would ask the following questions:

  • How much effort are EMC going to put into this area? How fast will they be able to deliver?
  • Does a Greenplum connector for xCP and a feed into Greenplum constitute a platform? What else is needed to make it a platform?
  • What are the use cases? Gautam Desai mentions a document with 20 use cases.

Supporting Testing

November 30, 2012 at 10:30 am | Posted in Performance | Leave a comment
Tags: , , , ,

When the designers of WDK sat down to design the framework one thing I don’t think they did was decide to make it easy to test. Anyone who has tried to design scripts for load runner, JMeter or any other tool will have experienced the pain of trying to trap the right dmfRequestId, dmfSerialNum and so on. As for content transfer testing it is really only possible with the unsupported Invoker tool that comes with the load runner scripts.

So my question is to IIG-do xCP and D2 and any other new interface coming out of IIG make it easier to test?

Thoughts on EMC On Demand

November 29, 2012 at 7:22 am | Posted in Architecture | 2 Comments
Tags: , , , , , , ,

I think EMC first started talking about On Demand at EMC World 2011. The idea is seductive and logical: rather than have to procure your own hardware, install and configure the software, and manage and administer the running system you get EMC to do it for you. The potential benefits are enormous.

First, economies of scale for running hardware in the same way as similar cloud-based offerings. By running on virtual machines and providing scale out options you potentially only have to pay for what you use.

Secondly, experts who can specialise in various aspects of installation, administration and troubleshooting. Furthermore there is an obvious incentive for EMC to focus on initiatives to simplify and automate tasks. Presumably that was the idea behind xMS, the deployment technology recently released with D7.

As a consequence of that last point it gives EMC a great way to collect usage data, bug information and performance insights.

Finally I see great potential in distributed content, allowing content to be replicated across data centres closer to the user. On-premise installations currently rely on solutions like BOCS or content replication to deliver better performance to users in remote offices. These can be tricky to configure without expert help and rely to a greater or lesser extent in having servers in locations where the organisation doesn’t want them.

So clearly I see big benefits, at least in theory. I have several thoughts around OnDemand some of which I hope to explore in future posts; in this post I want to talk about some potential drawbacks and how EMC might address them.

The first question people seem to ask is how will I be able to install and manage our customisations if EMC are managing everything? In fact I expect EMC to put significant limits on how much customisation you will be allowed in OnDemand environments. Which means that the arrival of xCP 2.0 with its ‘configure don’t code’ mentality (and D2s ui configurability) is serendipitous indeed. In fact I doubt OnDemand would really be workable for WDK-based apps like Webtop, DCM and Web Publisher; no-one runs these apps without considerable coded customisations.

Secondly, for some organisations moving content to the cloud will remain problematic as they will have regulatory requirements, or internal security needs, that mean certain types of content can’t reside in particular jurisdictions. This is by no means insurmountable and EMC will need plenty of distributed locations to satisfy some clients. However it does make the Amazon AWS model of ‘click and go’ server resourcing much more difficult for EMC.

Finally from a personnel perspective how will EMC deliver the necessary staffing of data centres if OnDemand really takes off? Running data centre operations is not a core business for EMC ( as far as I know). My assumption is that they won’t be building or running the hardware operations themselves but are partnering with existing companies that have the know how. However even setting up and staffing the software side is new to EMC. Does it have the existing capacity already or will it need to recruit? Or will much of OnDemand be farmed out to partners? Will they run 24×7 from the US or (more likely) use a follow the sun philosophy.

Time will obviously tell but I remain optimistic that OnDemand will be a success – it will depend heavily on the execution in what is a new area for EMC.

Troubleshooting weird DCM messages

July 24, 2012 at 5:34 pm | Posted in Performance | Leave a comment
Tags:

This came up on the ECN forum today and the message is so obscure (but quite common) that I thought it worth writing up the troubleshooting notes.

The original post is here. The poster was trying to create a Change Notice or Change Request in Documentum Compliance Manager (DCM) and got the following error message in a dialog box:

The System can not complete your request. The action you have chosen is no longer valid because of a change in repository

This seems to be a generic message that DCM pops up whenever an ‘onexecutiononly’ pre-condition check fails.

What’s a pre-condition?

A pre-condition is a framework built into Documentum WDK (the framework that DCM, Webtop, Taskspace, WebPublisher, etc are built on) that allows menu options to be programatically turned-on/turned-off/greyed-out/hidden in the browser interface. To give an example a developer may have created a component to display the contents of a folder and for each entry there can be different menu options available such as View, Edit, Check-out, checkin, Create PDF rendition and so on. Now if a document is not checked out it doesn’t make sense for the checkin option to be available. In fact it would be just confusing if that selection was left available (WDK applications tend to be confusing enough as it is). So a pre-condition is a piece of code that can be run for each item which will return either true or false to decide whether a menu option is available.

What’s an ‘onexecutiononly’ pre-condition?

With great power comes great responsibility! Imagine you have 100 objects in a folder and you have 40 or 50 menu options for each one (not untypical). That’s 4,000 – 5,000 pre-condition checks. If the pre-condition code just does calculations and checks based on information available or cached on the application server then generally this is not a problem and your UI should remain pretty responsive. However if your pre-condition runs a query against against the content server, however ‘fast’, or does an object fetch (e.g. using IdfSession.getObjectBy…()) then you are going to suffer some pretty sluggish UI performance.

The WDK references do warn about this in the section on pre-conditions however it seems that this warning was not heeded in DCM 5.3 (naughty EMC). Generally navigating around DCM5.3 is pretty miserable for most productions users and the best that can be suggested is to upgrade to DCM 6.x (by the way if you absolutely have to stay on DCM5.x but can bear some development and testing effort to alleviate the pain then there are some code-based possibilities).WDK 6 introduced a new pre-condition setting – onexecutiononly – which was taken up by the DCM developers to ‘fix’ the performance problems they had introduced.

‘onexecutiononly’ means that the pre-condition is not evaluated when the list of objects is rendered onto the screen but only when the user selects the menu option in the user interface. As a result you no longer have 1000s of pre-conditions running when rendering the interface. Of course in a way this rather ‘neuters’ the power of the pre-condition because now we could have, for instance, check-in available for documents that aren’t checked out. If we try to checkin the document the pre-condition will return false and we will get a warning message on the screen. Typically like the one the poster saw when trying to create a change notice or change request. In that particular case there are likely to be some checks in the pre-condition code for a newchangerequest or newchangenotice action and they have ‘failed’. At the time of writing the problem hadn’t been fully resolved so I’ll update this entry if any new information comes to light.

How Documentum Print Control Services Works

December 15, 2011 at 1:18 pm | Posted in Architecture | 1 Comment
Tags: , ,

This is the second part of a mini series of articles on Documentum Print Control Services (PCS) and how to use it effectively. The first part provided an introduction and overview of PCS. In this article I will take a much more in-depth technical look at the product.

PCS consists of a number of components:

  • A DFS-based web service that is deployed on a JBoss application server
  • A set of DARS that contain services that can be used by user-facing applications
  • Optional WDK components for Webtop and Taskspace (as mentioned in the first article this PCS support is built into Documentum Compliance Manager)

As we will see later PCS also relies on PDF and Postscript rendition generation so DTS or ADTS is required.

So what happens when a controlled print request is issued from an application? The printing user-interface will usually collect some information from the user relating to the object to be printed. This will include the name of the printer and a reason for the print. Once the request is received by the application server control will be passed to the PCS ControlPrintService.requestPrint() function.

The requestPrint function does 3 things. First PDF Stamping Services (PSS) is used to create a watermarked copy of the main PDF rendition. I may cover PSS in more depth in another article, however the key point here is PSS takes an existing PDF rendition and generates a watermarked PDF that can include metadata overlaid in headers, footers or other areas of the document. PCS and PSS have tight integration where PSS exposes a Controlled Print-specific configuration and PCS can pass in Controlled Print attributes such as copy number, recipient and printing reason to be watermarked on the document.

Click to view in a new window

Next a dmc_pss_print_copy object is created in the repository. The watermarked PDF is the primary rendition for this object and the object is linked to the /Temp/PCSCopies folder. At this point the object’s print_status attribute is set to ‘Created’.


Click to view in a new window

Finally, a request for a Postscript rendition for the dmc_pss_print_copy object is made. The rendition will have a page_modifier of ‘PS4Print’. The server will wait for up to 2 minutes for the rendition to be generated and then return to the caller. Either way the print_status field is set to ‘PsRequested’. Up to now all the processing is synchronous, but now control is returned to the user of application.


Click to view in a new window

At this point the user is probably expecting the printer to output the printed document, however no print request has yet been sent to a printer, there is simply a dmc_pcs_print_copy object created possibly waiting for Postscript rendition to be created. There are 2 asynchronous task still required to be completed. First the Postscript rendition needs to be created:

Click to view in a new window

Of course it may have been created during the earlier synchronous processing but there is no guarantee. Continuous uninterrupted operation of controlled printing requires that your DTS or ADTS infrastructure is resilient, scalable and sized for all the rendition requests generated in a production environment. If your users have requested prints that don’t seem to be appearing your first port of call for troubleshooting is to confirm that DTS/ADTS is working and that Postscript renditions are being created for your dmc_pss_print_copy objects.

The Print Control Services server is ultimately responsible for sending your document to the required printer. Calling the Print Control Services server is the responsibility of the PcsAsyncPrintJob. For controlled print (and recall) requests to be completed in a reasonable amount of time this job needs to be set to run every couple of minutes and needs to be monitored for regular execution and successful job completion.

When PcsAsyncPrintJob runs it queries for all dmc_pss_print_copy objects that have print_status = ‘PsRequested’.

Click to view in a new window

For each dmc_pss_print_copy object the PCSAsyncPrintJob does the following:

  1. Ensures that a Postscript rendition has been created. If not no further processing is done on this execution of the job.
  2. Then calls the remote ControlPrintService DFS endpoint on the PCS server, calling the ‘print’ method.

Once the print request is received by ControlPrintService component the following happens:

  • The audittrail is checked to ensure that the same document has not been printed with the requested copy number. If for some reason there is already an audittrail entry for this copy number an error is raised.
  • The postscript file is sent to the printer using the Java Printing Service API.
  • The service monitors the print job until completion (or failure) and then returns a response to the PcsAsyncPrintJob job.
  • Creates an audittrail entry to record the controlled print


Click to view in a new window

The actual “printing” part of PCS is carried out using the Java Printing Services (JPS) API. If you are going to be making use of PCS in your organisation it may be worth your while getting to know the JPS a little better. I’ll discuss JPS in more depth in a later article. Once PCS has sent the document for printing it sets the print_status attribute to ‘PrintRequested’ – this is the last status update for the document. Note you only know that PCS has requested a print from the printer – there is no way for PCS to ‘know’ whether that print was successful and so it can not update the object further.

The key points to take away from this article are as follows:

  1. First, when the WDK application server returns control back to the user after a print request has been made there is no guarantee that the document has been sent to the printer. There are 2 layers of asynchronous processing required to print a document; depending on the speed, capacity and availability of the relevant servers it may take some time for the print to appear.
  2. Second the print may even not appear at all if there is a problem with one of the asynchronous components. This fact may not be obvious to the end user who may just assume that printing is “slow”.

An Introduction to Documentum Print Control Services

December 5, 2011 at 1:31 pm | Posted in Architecture | 3 Comments
Tags: , ,

This is the first part of a mini series of articles on Documentum Print Control Services (PCS) and how to use it effectively.

Documentum PCS originated in the compliance products however from the 6.6 release it is a standalone product. If you haven’t worked in regulated environments before you may be a little unclear as to what its purpose is. PCS “controls” the printing of certain important documents, ensuring that a number of things happen when a “Controlled Print” takes place.

I’ll discuss the what first and then explain the why. First whenever a controlled print of a document is made that fact is recorded in the audit trail. A copy number is associated with the document and recorded in the audit trail entry; if you print another copy of the document then the copy number is incremented. In effect every print of a document is uniquely identified by object id and copy number. In fact PCS works in close tandem with Documentum PDF Stamping Services (PSS) to allow a watermark including the copy number to be overlayed on the printed document.

Additionally every printer in the organisation has to be added to the PCS configuration so controlled prints can only be made to well-known printers. Again the printer to which the print is sent is recorded in the audit trail.

Finally, subsequent to executing the controlled print, it may be necessary to record a ‘Recall’ of the print. A ‘Recall’ is recorded in the audit trail against a unique document print (the object id and copy number). The reasons for needing a recall maybe part of the operational lifecycle – one or more documents may have been superseded by an updated version and so all prints of the old version must be physically removed and that removal needs to be recorded. Alternatively it may simply be that a print was stuck in a printer or damaged or lost. It’s worth bearing in mind that when ‘Recalling’ a document with Documentum PCS the only thing that happens is that the recall is recorded in the audit trail as evidence and for reporting. PCS won’t, for example, halt print requests already sent to the printer.

A recall results in a notification sent to the inbox of interested parties. The recipient has to confirm acknowledgement of the notification, at which point a further audit trail entry is created. Thus there are 3 types of audit entry that can be created:

  1. On print
  2. On recall
  3. Recall confirmed

As pointed out in the comments both print and recall actions require the user to authenticate themselves before they are able to proceed.

So now we know what PCS does but it may not be clear why an organisation would need this functionality. As I alluded to earlier printing control is often used in regulated environments. Typical examples would be pharmaceutical or medical manufacturing, or aircraft production. These activities often take place in a factory or lab and need to follow defined and documented processes. Often this process documentation is physically printed, as online reference to the documentation is inconvenient or difficult.

In these types of scenario it is clearly essential that correct and up-to-date documentation is used by production staff (how happy would you be if certain components on the plane you are flying on were manufactured using out-of-date processes?). Not only does it make sense for management in these organisations to know there is a process to record what documentation is in use and when it is updated but in many cases regulatory authorities will required evidence that such systems are in place and demonstrated to work.

Given the above it is unsurprising that this functionality originated in the Documentum Compliance Manager (DCM) product. In earlier versions of DCM watermarking and print control were achieved using integrations with Liquent’s PDF Aqua products. However PCS was introduced as part of DCM 6.5 sp1 and became a separate product in 6.6. Controlled Print and Recall functions are provided as part of the DCM user interface but the latest release of PCS product comes with components that can be installed in the Webtop and Taskspace interfaces. This is part of EMCs policy of moving compliance functionality into the core stack and making it available to all clients rather than retaining a dependency on specialised interfaces. No doubt these features will be available in the C6 products at a later date.

The next installment will dig into the guts of the PCS architecture to see how it works.

Update (14 Dec 2012): I wonder where the new Life Sciences products announced with D7 will fit in with PCS? Will they use PCS and PSS or is there some other technology to do this?

 

Momentum 2011 pt 3 Jerome’s Architecture Session

November 1, 2011 at 5:57 pm | Posted in Performance | 1 Comment
Tags: , ,

jeroem has another session tomorrow evening, The Innovation Engine, so a lot of the really cool new stuff will be. Shown then. However this is what he did show this morning. As usual we overran-EMC how About making this slot longer ?

6.7 loads of perf improvements, some examples (too many on the slide to copy down):
64bit Content Server 4:1 reduction of Content server instances
Input accel pages/day 500k, tasks/day 2m
Doc science improvements
Bam 10k reports/day consume 15 m events/day

Current stack has innovation everywhere-much if it arising from future work on NgiS

High Avail:
Bocs becoming smarter in predictive caching

Deploy efficiency.
Xms:
blueprint defines requirement ,
Blueprints are environment independent
Each vm has hyperic agents. Centralised hyperic db (telemetry project). Dashboard hyperic client.
Adjustment , ie reaction based on hyperic data

Integrion??? Feed it with hyperic data and learns behaviour. Lead to auto remedy when application shows abnormal behaviour

Session pooling. Analysed context switch. Look to have more linear session increments – better scalability (nb dctm already really good at this)

Showed metric
10k getsessions 6min -> 8 sec
10k getcontent 7.5 min -> 34sec

New services: Ajax-> rest services. Use cases
1) content services incl cmis
2) generated Service eg invoice object type incl generated restful service

“xcp misunderstood “, app modelling (semantic model ) then generate runtime. Ultimately leads to auto deployment. Don’t just focus on ui.

D7 dormant state feature
Rolling upgrade + others stuff I didn’t capture

Ngis:
Embed greenplum
Impl xacml

“big information” “bring processing to the data”
Realtime map reduce (traditional map reduce too slow)

Kazeon crawlers!

Momentum 2011 pt 2 Roadmap Session

November 1, 2011 at 12:50 pm | Posted in Performance | Leave a comment
Tags:

This is John McCormicks regular roadmap session. there is the usual disclaimer that dates are not set in stone.

Overall headlines:
Work on making IiG products cloud deployable has had knock on benefits for all deployments (jeroem touches on this in his session immediately after.

Cornerstones of ECM:
Process-analytics-content mgmt-collaboration-?
(nice to see content mgmt fAirly and squarely in the middle of that list. There 5 items on that list, I missed the last one)

D6.7sp1/xPlore 1.2: nov for most products, some clients a bit later
D7/xPlore 1.3: Q3 2012
D7 sp1/xPlore.next Q1 2013

TRUST and SECURITY

Rm 6.7 mssp, Dod 5015.2 v3, working on moreq2010
Rm7 high performance ingestion & disposition (bring large-scale dispositions from “days to hours”

CLOUD
Vfabric/vcube – basis of OnDemand
new installers,
monitoring,
easier ha,
scalability

Partitioning, handle spikes-> lower cost I.e. No need to overprovision

Cloud enablement progress:
2010 atmos
Early 11, vmware ready
D7 vfabric, scriptable installer, dynamic provisioning

xMS (= xCelerated Management Service, I think)
“xms” foundation for xcp2 and d7
Templates/blueprints (more in Jeroem’s session)
Deployment goes from weeks to hours

Amp (asset management product) tracks usage from logs

D7 improved monitoring first platform &. xPlore then Apps

Perf and scalability
Type caching xcp
Queue mgmt

Widening group/ username size 256 chars!!!

CHOICE
Web service : restful (still supporting soap-style but much new Ui will work on top of rest)
Generated application services , custom web apps. Create object types-> automatically generate service code (expect more on this from Jeroem tomorrow)
Interoperable cmis 1.1 expected next year

Mobile standards support

My dctm/ sdk, custom objects

6.7.1 merge desktop and offline
Support mac

SEARCH & ANALYTICS
Integrated analytcs/facets

Big idea: Search with action (via Fss,cis, xplore)

CIS: Linux, office2010, taxonomy cold start (point cis at repository and suggests an initial taxonomy)
Xplore: 1.2 thesaurus , custom nlp, new lang support, improved performance (wildcards)
Query based subscription-scheduled, initiate process, notify etc (idea: you specify a search term then xPlore schedules search and initiates some action if new data found)

Fast support ends 2011!!! Hundreds xplore deployments already

1.3 integrated analytics?? Not sure exactly what this meant

Momentum 2011 part 1

October 31, 2011 at 12:25 pm | Posted in Performance | Leave a comment
Tags: ,

It’s been quite a while since I last posted. Life on both a personal and business level has been super hectic for the last 6 months which meant that something had to give. This blog was one of those things and the DFCProf project the other. Things have more or less returned to normal and I hoped to be ready to start blogging by the time Momentum rolled around. And I made it … just!

I intend to get back to some regular posting as well as spend some more time digging into interesting Documentum internals and performance. I’ve already got some interesting topics in the works such as: hardware mistakes, why you should upgrade Documentum Compliance Manager (DCM) if your DCM 5.x installation performs like a dog and what children’s birthday parties can teach us about performance. I’ve also long wanted to write about Documentum folder performance myths.

First things first.My last post, sometime back in May, was from EMC World and my first interest this week at Momentum is how some of the exciting announcements from Las Vegas have panned out over the last 6 months. The really exciting stuff was around Documentum OnDemand, cloud based Documentum, Captiva and DocSciences. How close are we to a real product? EMC has been prone to make big announcements over the last few years and then be slow to follow through with actual production quality product. Will this time be different?

As usual Jeroem van Rotterdam’s future architecture, Next Generation Information Server talk will be eagerly awaited. In addition I’ll be interested to hear how people, especially other customers, have responded to the recent highly marketed Oracle attack on Documentum. Finally since I’ve been working a lot with DCM in recent months I’m interested to hear what the roadmap is for a compliance product. A while back EMC seemed to be pulling back from explicitly supporting an in house Compliance product however the more recent messages don’t seem to back that up. DCM 6.7 is out with substantially improved performance and eSigs, overlays and controlled printing are supported by new product releases.

Alternative documentation

July 22, 2011 at 11:38 pm | Posted in Performance | 1 Comment
Tags:

I started the working day with a blank piece of paper and the goal of completing a technical article. As so often when I write I need to build up some structure before launching into writing words: headings, concepts and key ideas.

Sometimes I even find it difficult to conceive and organise the ideas I want to write about. On these occasions I start mind mapping; writing down random ideas on the subject and then connecting, extending and elaborating. The article structure usually flows directly from this activity.

The ideas are generated in a non-linear and inter-linked manner. In fact it struck me that the standard text-based article or document presented in a web browser or document viewer tends to present ideas in a highly linear manner with only limited linking functionally. In many ways the text document is a poor vehicle for presenting some technical material.

I also spent much of the day catching up on Documentum technical videos. I was struck by how effective they are at presenting both conceptual ideas and simple how-to demonstrations.

Both these ideas set me thinking about alternative methods of presenting project information. I tweeted the idea looking to see if anyone else had thoughts in this area. Lee Dallas responded with his experience: video knowledge capture, in fact he expanded his tweet into a blog post http://bit.ly/phLkdd.

« Previous PageNext Page »

Create a free website or blog at WordPress.com.
Entries and comments feeds.