What Oracle Optimizer Mode does Documentum need?

April 15, 2010 at 4:22 pm | Posted in Performance | Leave a comment
Tags: , , , ,

This post is inspired by a number of recent occasions where I have been asked for advice on what setting should be used for the Oracle optimizer_mode parameter. It is an expanded version of a posting I made to this thread in the EMC Documentum support forums.

In the beginning …
Once upon a time Oracle only had a RULES-based optimizer. Given a query it would work out what indexes to use and what order to join tables using a set of rules. The rules were based on features of the tables (such as what indexes they had) and the elements of the query (e.g. columns used in the WHERE clause). It didn’t make use of statistics about how many rows and data blocks each table and index consumed.

Cost-based Optimizer
With version 8 (or maybe somewhere during the version 7 product cycle can’t remember) Oracle introduced the Cost-Based Optimizer (CBO). CBO uses statistics on tables, indexes and other objects and processes this information to model the expected resource cost of different ways of executing the query. For even moderately complex queries there are literally thousands of combinations (called query execution plans) of table join methods, table access methods and join orders. The CBO simply chooses the one with the lowest cost according to its model.

The CBO is the reason why up-to-date statistics are absolutely essential. Irrespective of whether you use Documentum’s own dm_UpdateStats job, Oracle’s stats gathering job or a custom script, you need to have statistics that reflect the volume and distribution of the data. Where row volumes are volatile, statistics will need to be updated with sufficient frequency to ensure the CBO has the information necessary to make the correct decision. If you are using Oracle 10 or 11 for your Documentum implementation you should be using the CBO.

In principle this is a brilliant and sophisticated piece of software however like all models it doesn’t always represent the ‘truth’ and can make ‘mistakes’ when your data isn’t consistent with the assumptions the CBO makes.

The CBO has a number of different optimizer modes as of 10G:

  • ALL_ROWS
  • FIRST_ROWS_n (where n can be 1, 10, 100)
  • CHOOSE
  • FIRST_ROWS

ALL_ROWS is the default option and the CBO tries to find an execution path that minimizes the cost in terms of CPU and disk io to deliver all the rows.

FIRST_ROWS_1/FIRST_ROWS_10/FIRST_ROWS_100 try and find an execution path that returns the first 1/10/100 rows quickest even if the overall resource cost of returning all the rows is higher than ALL_ROWS cost. This sounds like it should be faster for things like WDK screens returning the first 50 objects in a large folder but the reality is more sobering.

The basic strategy for returning the first rows quickly involves avoiding blocking operations such as sorting or joining tables using hash joins. For example, suppose I have the following DQL:

select * from dm_sysobject where any i_folder_id = ‘0b00000180000111’ order by object_name

In Oracle dm_sysobject is implemented as a view joining the tables dm_sysobject_s and dm_sysobject_r. i_folder_id is on dm_sysobject_r and will have an index created during the repository setup. object_name is on dm_sysobject_s (also with an index). The quickest way to execute this query – assuming i_folder_id is a relatively selective predicate which would be the case for most Documentum datasets – is:

  1. Retrieve all the dm_sysobject_r rows that satisfy the predicate using the index on i_folder_id
  2. join to dm_sysobject_s
  3. sort the result

Clearly we would have to execute the whole query before we can sort it so in effect we have to wait almost the entire execution time to get the first row (ie sorting is a blocking operation). However all the other rows follow quickly afterwards.

What is the resource cost of this execution plan? Well there is a range scan on the i_folder_id index but assuming that i_folder_id is selective retrieving all the rows is relatively cheap. Each row is then matched to the corresponding row in dm_sysobject_s using r_object_id – each match is very quick as the table is ‘probed’ using a unique index on r_object_id. The end result is likely to be 4 or 5 blocks visited for each row returned. Returning 100s of rows in this way would typically take a fraction of a second.

The only way to avoiding the blocking sort operation is to drive the query through an index on the object_name (the ‘order by’ criteria). To do this we would:

  1. Scan the whole index on dm_sysobject_s.object_name to retrieve all the dm_sysobject_s rows in object_name order
  2. For every row retrieved a join is made to the corresponding row in dm_sysobject_r and checked to see if it meets the i_folder_id predicate
  3. If it does meet the predicate we can immediately return the row to the caller before moving on to find the next row

We do not have to execute the whole query before returning the first row(s); there is no blocking operation.

However intuitively this should seem much more work for anything other than the smallest system. dm_sysobject_s is usually 100,000s of rows for small systems and can grow to 100Ms in large systems. Whilst the size of the index on object_name is a fraction of the size of the table itself it will still be a significant fraction. Scanning the whole or a sizeable part of the index can be time consuming. Even worse we have to visit a block in dm_sysobject_r for every row returned by dm_sysobject_s!

This problem arises whenever the attribute in the order by clause is on a different table to the attribute in the selective predicate. For even moderately-sized repositories such an execution plan would seem sluggish to the end user. This is typical of the sorts of queries produced by WDK folder navigation components.

In such cases FIRST_ROWS_n typically comes up with identical execution plans compared to ALL_ROWS.

CHOOSE is an old optimizer mode that used rule-based optimizer if there were no statistics on any of the tables in the query, and ALL_ROWS if at least one table had statistics. Quite useful during the transition era between RBO and CBO but now deprecated.

FIRST_ROWS – this was an early attempt to provide the functionality now accessible via FIRST_ROWS_n. CBO would run it’s model as usual but only consider certain combinations in an attempt to provide execution paths with quick response time.

One of the heuristics it seems to use is the avoidance of blocking operations if possible. For example if you process the query discussed above using the FIRST_ROWS optimizer setting it will probably take the bad non-blocking path.

FIRST_ROWS is deprecated by Oracle in 10 and 11 and I would advise people to never, ever use this mode (despite the fact there is at least one posting on the contology Documentum forum that suggests a performance improvement by using FIRST_ROWS!).

Take-away

  • Use ALL_ROWS or FIRST_ROWS_10
  • Make sure you are updating statistics on database objects frequently
  • Never use FIRST_ROWS
  • Don’t confuse FIRST_ROWS_10 with FIRST_ROWS, there’s a big difference
  • Avoid using RULE unless you happen to be on a really old Documentum platform such as EDMS98 or 4i (and therefore using an old version of Oracle)

Blog at WordPress.com.
Entries and comments feeds.