We need a DQL Folder select attribute

June 3, 2010 at 1:44 pm | Posted in Performance | 13 Comments
Tags: , ,

Am I the only one who yearns for the ability to display the folder path when selecting on dm_sysobject or its descendants? I’m thinking of something like:

select r_object_id, object_name,folder_paths from dm_sysobject …

which should return a repeating list of all the folders the object is linked to. Another enhancement might be to have a primary_folder_path attribute so that it only returns a single valued attribute.

I find myself needing this feature so often either in admin scripts or in user code that I wish it was just there instead of me continually having to work round the lack of it.

13 Comments »

RSS feed for comments on this post. TrackBack URI

  1. We saw the same issue during CenterStage development, so we created a custom property called “_kw_location” which can be added as a value to select when issuing queries through the CenterStage query service.

    Our property just returns the primary location, since that’s generally the most important location and it simplifies the result data.

  2. fascinating – can’t wait to have a look. Would be nice if it could be moved into the Content Server so all applications could benefit.

  3. Maybe I am getting your intentions wrong, but why wouldn’t you get away with this query:

    SELECT r_folder_path, r_object_id
    FROM dm_folder
    WHERE r_object_id IN (
    SELECT i_folder_id
    FROM dm_sysobject
    WHERE object_name=’foobar’
    )

    Are you considering this kind of queries a hack?

    • I wouldn’t consider it a hack necessarily but having written DQL like this (and bjobo’s below) thousands of time both in admin scripts and code it suggests to me that there is a common DQL use case that is missing – EMC ought to be providing the facility for us rather than us having to reinvent the wheel a million times.

      The specific problem with the query you provide is it doesn’t allow me to directly retrieve all the attributes of the dm_document object – you have to do that in code or via another query.

  4. Since Documentum 5.3 you can use ENABLE(“ROW_BASED”) which leverages following useful query:

    select a.i_chronicle_id, a.r_object_id, a.r_version_label, a.i_latest_flag, a.object_name, a.a_status, f.r_object_id as folder_id, f.r_folder_path
    from dm_document(all) a, dm_folder f
    where
    f.r_object_id=a.i_folder_id
    and any a.r_version_label like '%.%'
    and f.r_folder_path != ' '
    and a.r_object_id=''
    order by 1,2 ENABLE("ROW_BASED");

    • This works as does similar queries such as joining on dm_folder_r registered table. However such a query is inherently ugly (f.r_folder_path != ‘ ‘).

      It also has a few dangers for the inexperienced or the hurried. Presumably you are aware that the r_version_label like ‘%.%’ predicate means that some rows are not returned (e.g. the row for ‘CURRENT’) which means that all folder paths are not returned. Perhpas in this case you are relying on the fact that the implicit version label (the one that meets the r_version_label like ‘%.%’ predicate) is returned first and so the primary folder path will be returned. It seems remarkably easy to get tripped up trying to implement such a simple requirement. No wonder Documentum is seen as a (unnecessarily) complicated product.

      EMC introduced ENABLE(ROW_BASED) in the version 5 platform because the default repeating attribute functionality makes it difficult to do even what you have suggested above. This leads me to think that the SQL heritage of DQL and other similar apis from other vendors was an unfortunate choice driven no doubt by the fact that when many of the sytems were first architected RDBMS was one of the few sensible and well understood database technologies. Can’t help thinking that SQL and rows and tables is not that well suited to modelling content objects, XML would be far better – I wonder if Documentum considered SGML all those years ago?

      • Would you mind expanding on your reasons for why one would consider a language primarily targeted towards linear tree-graphs like XML as a storage model for M:N relations (like folders, renditions, versions, and documents)?

        I, for instance, have difficulties to think of some exemplary XPath expressions to help me understand your reasoning.

      • Robert, I think there are 2 different things to consider here one is the storage model for document content (currently rdbms, filesystem and optionally an xml database) and the api that is used for access.

        In the case of storage there is probably no overwhelming technical reason why you would use rdbms, xml database or something else that did the job. However as Documentum have invested in a very good xml database (x-hive) with some excellent people involved it would certainly make sense for them and us if they were able to cut the rdbms out of the system. Then they control the development of a fundamental part of the system. We as users/partners/consultants/customers don’t have to pay for and deal with Oracle, SQL Server or whatever.

        In the case of apis, I don’t think a rowset is a particularly natural way of modelling an ECM object. We already know that repeating attributes are a ‘kludge’ – they could be represented in a much more obvious way in an xml format. More ambitiously a rich api that could, for instance, return a document, it’s attributes, a list of folders and the most recent discussions is something I can envisage. I’m sure I could go on but maybe that is something for another post.

        Furthermore as more content formats become represented as xml it makes sense that we stop treating metadata and content as 2 separate things.

  5. Hi Robin,

    I believe the origins of not having this “folder attribute” on the sysobject/document level is pure performance reasons. I do remember when I first started in 3.x, that there was a bug that caused r_folder_path value being incorrect when moving folders. Imagine if you had a million docs located in folder A under “/cabinet1/folder1/” and you decided to move the folder to “/cabinet1/folder2”. This means that the server needs to update 1 million objects based on simple move action. Having this on the dm_folder object type minimizes the number of objects that has to be updated (1 in the aforementioned example).

    -Johnny

    • Hi Johnny, I’m not thinking of the back end storage aspect but thinking of DQL as an API. If, as some of the comments show, you can put together a DQL that includes the folder path then it would make sense that Documentum could do it for us. A similar idea is the i_all_users_names attribute on dm_group; this doesn’t actually exist as an attribute on dm_group_r, it is computed by the sql that is submitted to the database. This is such an obvious and common use case that it really should be supported by DQL. I’m trying to drum up support for the idea.

      • Computed value makes absolute sense. Add me to the petition 🙂

  6. A computed attribute is my preference too. Readers wishing to list folder paths with objects today can look at this example.

  7. Imagine that your document is linked to two folders
    /cabinet_1/aaa/xxx/myDocument
    /cabinet_1/aaa/yyy/myDocument

    but folder xxx as well as folder yyy happen to be linked to /cabinet_1/aaa, but also to /cabinet_1/bbb
    In that case, myDocument has 4 locations.
    If aaa and bbb are themselves also linked to a cabinet_2, that would account for 8 locations.

    What would you like your query to return to you?
    – all 8 locations,
    – or just the primary and secondary folders of the document itself, but only the primary folders of the folders

    All 8 would probably be the most truthful response, but I don’t think it will be an easy task to explain to anyone why
    select i_folder_id from dm_sysobject where r_object_id=’08…’ results in 2 rows, and
    select folder_paths from dm_sysobject where r_object_id=’08…’ results in 8 rows.

    If we opt for the first possibility (8 results), people will want to know which of the 8 results correspond with the primary folders.
    If we opt for the first possibility (2 results), there will be use-cases where people require an extensive list of all paths.

    I don’t have the answer. Just trying to point out the complexity.

    Marcel


Leave a reply to Robin East Cancel reply

Blog at WordPress.com.
Entries and comments feeds.