Wednesday, September 15, 2010

OO Design Dilemma: Auditing Changes Across Hierarchical Objects

Here is a sample UML class diagram of the situation that posed me the OO design dilemma a few days ago.
Let me explain with an example,
The Electronic Items catalog has many Televisions. The Sony Televisions come with different specifications, such as refresh rate of 240 Hz, 120 Hz and 60 Hz. However, the 240 Hz ones also comes in different colors - Black and Grey.
Now, a store manager needs to see the recent changes across all catalogs, such as electronic items, musical instruments and so on. So, if someone added a new Sony 240 Hz color of Dark Black or removed the Grey one, the store manager needs to see this. However, a department manager may need to see changes at all levels, for example whenever one of her Catalog, Product, Specification or its Property is changed. So, if she needs to see all changes under Sony Television, you know, she needs to see changes in the name and value of underlying specifications and their properties as well as changes at the higher level, say, the price.

Now, the above design looks good from OO perspective. However, since we need to persist the data into a relational database, we will end up with one table per class and foreign keys to interlink. If you are doing Ruby on Rails, you will possibly use polymorphic association between Auditable and the subclasses. For an example, lets consider the following E-R design (not showing all relations):

Now, given this schema, if you need to find out the latest 50 Log Messages with Author and Associated objects for the Electronics Catalog, how would you write a simple query?

Look, here is an expected outcome of the query:
Sep 15, 2010

  1. Jane added new Color : Dark Black for Sony 240 Hz TV
  2. Shon changed the Sony TV Price from $700 to $600
  3. Jakie added a new Specification for Panasonic TV: Aspect Ratio -> 16:9
Sep 14, 2010

  1. ...
  2. ...

Now, you see, the date wise grouping can be done once the latest audit logs from all across the Catalog hierarchy is found. But how do you find it given a CatalogID to start with? The following query wont work:

SELECT TOP 50 * FROM Auditables WHERE AuditableTypeName='Catalog' and AuditableID=my_catalog_id ORDER BY Timestamp DESC

Because it will only produce the audit logs for the Catalog Object, whereas we are expecting to see all the recent audits for this Catalog and underlying products down to the Specification Property level.

However, one work around is to query for the Audit Logs for the Catalog. Next, find all the Products of this Catalog and query for the audits for all these products. However, if we are to select top 50 audits for the catalog and its whole hierarchy, how many should we select here? And the problem gets even worse,  when you have to repeat the above steps for Specifications and Specification Properties. When you have to take the latest 50 audit logs after you run a few queries, say 10, you have to take 50 audit logs for each  of them. Because, the latest 50 results can be from a single query :-(

Looks like the Auditable class and corresponding design doesn't work well for this situation. So, what is a possible solution?

It's hard. I don't readily see a great solution. However, workarounds are there. For example, we can change the Audiable class to hold references to all levels of the Catalog object hierarchy. So, in that case the Auditables table will look like the following:

With such a schema, we need to ensure if the Audit Log corresponds to a SpecificationProperty, we put references to all its higher level objects. So, the query will be simple. With this assumption, the following query will be able to fetch all audit logs for Catalog and its descendants.

SELECT TOP 50 * FROM Auditables WHERE CatalogID=my_catalog_id

Similarly, the following query will produce the audit logs for a single project and its descendents:

SELECT TOP 50 * FROM Auditables WHERE ProductID=my_product_id

However, it has its downsides as well. The Auditables is no longer a general purpose Auditable. If we add a new type of Auditable, we cannot use it unless we alter its properties! High Coupling! Less Reuse. Also, a lot of if-else will be required to show the Audit logs, as it can belong to multiple parents!

How would you solve this design dilemma? Comments welcome!