Sohan - on software artisanship: Architecture

Showing posts with label Architecture. Show all posts

Saturday, July 28, 2012

Deploying on TV screens

Off late, I am working on a project to render real time business data with interesting visualizations, so people can feel the pulse of the business. For the last couple of months, I have been planning to write a detailed post about it. But after a few false starts, I am finally settling on smaller posts, telling a small part of the story each time.

So, have you ever worked on a web application that is primarily viewed through 55"+ 1080p TV screens?

Photo credits to 5mal5

We are showing real time business data, aggregated from multiple data sources as they are happening. The screens are gonna be mounted on the wall, like as you'd see in the airports. And it needs to be running 24x74

This introduces an interesting deployment challenge:

How would you reload the screen every time you re-deploy the app?

A regular web app is interactive. So, when we re-deploy the app, the users typically get the latest version as soon as they reload the page or navigate through the app. However, in an airport like setting, where information is displayed across many screens, and typically no-one is clicking it, the app needs to be aware of updates and reload itself to achieve the same. This is essential, for example, if there's an API change on the server side, the HTML/Javascript/CSS must be in sync to be able render it.

The app itself uses JSON API calls to render the live data. Each screen is somewhat like a single page app, using multiple AJAX calls to render different parts of the screen, showing different data. The API calls are all funneled through a single Javascript module. The module looks like the following (showing a simplified version for brevity): If you have noticed here, there's an extra check inside the success callback. To begin with, the page remembers the server token on reload. So, whenever there's a new token, it refreshes the page. Since all API calls are funneled through this module, this become a no-brainer to support new screens/API calls.

Our API's respond with a server token, which is guaranteed to:

remain same for each server deployment, and
change whenever there's a new deployment.

However, we still need to make sure the server token indeed ensures these two essential properties. With a little trick, this becomes trivial. For our app, we are using Capistrano to deploy our Ruby on Rails project. For those new to Capistrano, it uses a timestamped directory for each deploy, symlinking it to the current. So, it looks somewhat like the following: Every Ruby on Rails app also comes with a little method, Rails.root that returns the full path to the directory of the current deployment. So, in this example scenario, we get the following:

Rails.root #==> /app/realtime/releases/20120729083021

Since every deployment will be a new timestamp, this method ensures a unique token for each deployment. That's all we need for the api module to be aware of new deployments and auto refreshes. Here's an example controller/action (again, simplified for brevity): I liked the organic nature of this technique. It is harvesting on the available tools. Although the examples in this post show Ruby/Rails as an example, I am sure the same techniques can be applied to other technologies with the same simplicity.

Before I conclude, I would share one limitation of this technique here. Since the page reload happens on a shared api module, the reload needs to be generic, without requiring any special knowledge about the pages to reload. This pretty much means, a page needs to be able to reconstruct itself entirely from it's URL. Requiring any Javascript state beyond the URL, would probably require API specific handling to reload, killing the advantage of this technique. But the good news is, its always a good practice to rely solely on the URL to construct a page.

Thanks a ton if you've followed all the way. Stay tuned for the upcoming posts, where I will be telling the story of handling multiple API calls on a page, highlighting data changes and some other interesting bits about a real time dashboard.

Tuesday, January 17, 2012

Object Versioning is an Open Design Problem

This unsolvable maze is a local food from Bangladesh, known as Jilapi
Photo credits to udvranto pothik

Object Versioning is often required by a business rule, for example, to maintain an audit trail or to be able to revert to a previous version, etc. This is the 3rd time in my career where this Object Versioning requirement made me think like -

There's gotta be an easier solution!

But, I am yet to find one. So, I am thinking it's one of those open design problems, may be.

To clarify the requirement with an example, let's consider the following scenario:

A lawyer is preparing a document for one of her clients using a software. On January 17th, she needs to take a look at the version of the same document from May last year so that she can backtrace some changes that took place during these months.

Lets assume the lawyer is using a software that stores the documents in a relational database with the following schema.

A Document has many Evidences, each provided by an EvidenceProvider

Document (id, client_id, case_id, name)
Evidence (id, document_id, evidence_provider_id, details)
EvidenceProvider(id, name)

Now, given the versioning requirement how would you design your data model?

Here's a couple of points that your design should address at a minimum:

Going back to a version means a complete snapshot of the old version of the document. So, the version of May 1st should only bring the evidences that were there on that very day.
As a new version is created, it should inherit all previous evidences.

As I have mentioned earlier, I am yet to find a good data model that can take care of these concerns without over-complicating everything. Let me know if you got a beautiful solution to this problem.

However, in my latest project, the requirement is even harder. It's somewhat like this:

The lawyer may have some documents in the "work in progress version". This means, if she needs to print a document for the court, she only wants to print the last "good version", skipping the "work in progress version".

Also, when there is such a "work in progress version", she needs to attach any new Evidence to both the last "good version" as well as to the "work in progress version".

Well, now you see the design of a data model for Object Versioning becomes really messy and unintuitive.

So, here's my question to you - how would you design a solution for this?

Wednesday, September 28, 2011

The Perils of Soft Delete

Often times, applications cannot get rid of the database records for business rules. For example, if you are a cable TV provider, you might have a customer calling you to stop the National Geographic Channel subscription from next month. Ideally you would like to delete this record, but you can't do it until the effective date:( If you delete it, the next invoice will not be able to charge for this channel, although it's still being used.

In such scenarios, its common to fall back to a soft delete model. As an example, here's a little code:

This code looks simple and harmless at a first sight. But, as you develop your app, you'll run into a lot of issues from this. Here's a short list:

You need to take care of cascading soft delete, for example, cancelling a customer's subscription needs to cascade down to all channels and other objects under it.
Whenever, you are listing the current subscriptions of a customer, you need to filter out the ended ones.
You need to figure out a strategy to periodically clean the ended subscriptions so that your database is not filled with outdated data.
If you have a business key used as a primary key as well, you will be in trouble if an ended subscription is restarted.
Your code will eventually have a lot of if-else blocks to apply changes to only on active objects.

So, what is the solution? Its best to avoid them as much as possible. Richard Dingwall has a detailed blog post on some alternative techniques to avoid soft-delete. But if you have to have soft delete, as shown in the example, its worth remembering the aforementioned points.

Wednesday, June 22, 2011

Service API and Exceptions

Too often I see I am using a REST/SOAP API to talk to a third party system that frustrates me because of poor error messaging. Here's an example to illustrate a typical frustration:

Call: city_service.update_residence_address(city_dueler, new_address)
Response: <status>Failed</status><error>Invalid address</error>

But the address just seems right to me. So, I need to know specific reason about why the address is wrong. The error message leaves this critical detail. What happens next is, I look for the log files created by the city_service, conceptually supposed to be a third party hosted service. To do this, I SSH to the server, then cd to the logs folder and get lost in many of the cryptic log files. Then eventually I find a log and if I am lucky, I find something like the following:

InvalidAddressException at:...
...
Street number is not listed..
...

Now, I know I had an issue with the streer number. But I can see this only after I looked into the log of the thirdparty server. In an ideal case, I don't have an access to their server. Even if I have it, I cannot easily understand their logs and stack traces! In most API calls to third party applications I have had this frustration to varying degrees.

Here's a lesson learned:
Next time you expose an API for another app, expose error messages as specific as possible - so that, no details is left buried into your log files.

Friday, January 28, 2011

ActionMailer 3 - why do you call instance methods as class/self methods?

I didn't even notice this little trick! As long as I didn't have to call deliver_welcome_message (or deliver_*) methods that would magically call welcome_message, I was happy that now the magic is gone. Things are transparent!
Here's an example showing the change: Say you have the following mailer:

class Notifier < ActionMailer::Base
  def welcome_message(new_user)
    #a warm welcome message
  end
end

Now, prior to Rails 3, or ActionMailer 3, you would write the following to actually call this method to get the benefits of ActionMailer magics, such as finding the view based on method name and so on:

Notifier.deliver_welcome_message(new_user_instance)

I am sure this deliver_* was a clever design decision to solve a hard problem, that is, finding the view name based on the method name. However, in ActionMailer 3, this is gone. Now the question is, if this trick is gone, how come it still finds the view name from the method name? Who sets the view name? To know the answer, first, let's take a look at how we call the welcome_message now.

Notifier.welcome_message(new_user_instance).send

Instead of

Notifier.new.welcome_message(new_user_instance).send

So, the magic deliver_ prefix is gone. But, did you see the new trick? Well, its a clever design again. The trick this time is, you call your instance method, welcome_message as if it was a class method. But there is no class method called welcome_message, so it instead goes to method_missing and thats how it sets up the view name from this call. Here's the code that does this little trick!

def method_missing(method, *args) #:nodoc:
    return super unless respond_to?(method)
    new(method, *args).message
  end

All it does is, instantiates the mailer with the method name!

However, this design decision has interesting side effects as well. Or may be not side effects, but rather core effects. For example, since you are calling your mailer methods as class methods, you cannot use a single mailer instance to send out multiple emails at the same time. In fact, every mailer has only one instance of message. So, it cannot store two messages at the same time. This is as if, you can have multiple methods in a class, but you cannot call more than one class or you will mess up the class's state!

Wonder why? Well, this is rooted in another key design choice: ActionMailer::Base is a subclass of AbstractController::Base. Now, if you look at controllers, you will notice that at any given point of time, a controller instance is only responsible for responding to a single action. This is logical for controllers. But how about mailers? I see a mismatch in my mental model and the actual implementation model. I don't see a reason why a mailer is a controller! For the sake of code reuse? But that could be done via delegation anyway.

I will end this post with one question:
Do you think mailer is a controller? hints: think about LSP.

Wednesday, September 15, 2010

OO Design Dilemma: Auditing Changes Across Hierarchical Objects

Here is a sample UML class diagram of the situation that posed me the OO design dilemma a few days ago.

Let me explain with an example,

The Electronic Items catalog has many Televisions. The Sony Televisions come with different specifications, such as refresh rate of 240 Hz, 120 Hz and 60 Hz. However, the 240 Hz ones also comes in different colors - Black and Grey.

Now, a store manager needs to see the recent changes across all catalogs, such as electronic items, musical instruments and so on. So, if someone added a new Sony 240 Hz color of Dark Black or removed the Grey one, the store manager needs to see this. However, a department manager may need to see changes at all levels, for example whenever one of her Catalog, Product, Specification or its Property is changed. So, if she needs to see all changes under Sony Television, you know, she needs to see changes in the name and value of underlying specifications and their properties as well as changes at the higher level, say, the price.

Now, the above design looks good from OO perspective. However, since we need to persist the data into a relational database, we will end up with one table per class and foreign keys to interlink. If you are doing Ruby on Rails, you will possibly use polymorphic association between Auditable and the subclasses. For an example, lets consider the following E-R design (not showing all relations):

Now, given this schema, if you need to find out the latest 50 Log Messages with Author and Associated objects for the Electronics Catalog, how would you write a simple query?

Look, here is an expected outcome of the query:
Sep 15, 2010

Jane added new Color : Dark Black for Sony 240 Hz TV
Shon changed the Sony TV Price from $700 to $600
Jakie added a new Specification for Panasonic TV: Aspect Ratio -> 16:9

Sep 14, 2010

Now, you see, the date wise grouping can be done once the latest audit logs from all across the Catalog hierarchy is found. But how do you find it given a CatalogID to start with? The following query wont work:

SELECT TOP 50 * FROM Auditables WHERE AuditableTypeName='Catalog' and AuditableID=my_catalog_id ORDER BY Timestamp DESC

Because it will only produce the audit logs for the Catalog Object, whereas we are expecting to see all the recent audits for this Catalog and underlying products down to the Specification Property level.

However, one work around is to query for the Audit Logs for the Catalog. Next, find all the Products of this Catalog and query for the audits for all these products. However, if we are to select top 50 audits for the catalog and its whole hierarchy, how many should we select here? And the problem gets even worse, when you have to repeat the above steps for Specifications and Specification Properties. When you have to take the latest 50 audit logs after you run a few queries, say 10, you have to take 50 audit logs for each of them. Because, the latest 50 results can be from a single query :-(

Looks like the Auditable class and corresponding design doesn't work well for this situation. So, what is a possible solution?

It's hard. I don't readily see a great solution. However, workarounds are there. For example, we can change the Audiable class to hold references to all levels of the Catalog object hierarchy. So, in that case the Auditables table will look like the following:

With such a schema, we need to ensure if the Audit Log corresponds to a SpecificationProperty, we put references to all its higher level objects. So, the query will be simple. With this assumption, the following query will be able to fetch all audit logs for Catalog and its descendants.

SELECT TOP 50 * FROM Auditables WHERE CatalogID=my_catalog_id

Similarly, the following query will produce the audit logs for a single project and its descendents:

SELECT TOP 50 * FROM Auditables WHERE ProductID=my_product_id

However, it has its downsides as well. The Auditables is no longer a general purpose Auditable. If we add a new type of Auditable, we cannot use it unless we alter its properties! High Coupling! Less Reuse. Also, a lot of if-else will be required to show the Audit logs, as it can belong to multiple parents!

How would you solve this design dilemma? Comments welcome!

Thursday, July 16, 2009

Rails modeling guide#2: naming convention for ruby on rails model methods

Naming conventions play an important role to the software's overall architecture. It is not a rocket science, still, it may lead to unhappy consequences if not taken care of at the early stage of a project. This small best practices can make a code base significantly improved.

Rails does a good job by using the dynamic power of ruby and providing with a handful of dynamic methods with the models. ActiveRecord::Base and its included modules follow a consistent naming, which clearly represent the intended purpose of the methods. At Code71, we are working on ScrumPad, a 2nd generation agile scrum tool using ruby on rails and our model methods are named according to the following rules-

1. All boolean returning methods end with '?'

company.billable?, sprint.current?, story.in_progress?

2. Boolean methods do not start with is_ or has_ or did_ (as you might see in other popular languages)

company.is_billable? -> company.billable?
sprint.is_current? -> sprint.current?

3. find_ and find_all are used only for class (self.find or self.find_all) methods and should return a single/array of object of the class respectively.

find_* methods may return a single object of the class/nil

find_all_* methods return an array of objects of the class or [], but never a nil

4. No methods start with a get_ as other languages.

5. A method ends with ! if it alters the object itself.

sprint.close!()
story.progress!()

6. Methods that persists an object/may throw exception, should always end with ! (implied from rule 5)

invoice.update_status!(:paid)

7. Always use parentheses '()' in method names. Future versions of ruby is deprecating the support for method names without parentheses.

Following these 7 simple rules we have consistent and intuitive model method names across the whole ScrumPad. Let me know if you have any suggestion to these names to make it even better.

Rails modeling guide#1: right structure of a ruby on rails model

Rails models are no exception compared to the super models! You are in the business if and only if you got a good physical structure and can stick to it for years...

At Code71, we are keeping our rails models attractive following a few guidelines. I will be posting these guidelines in a series and here goes the first one - about the physical structure of the ruby on rails models.

We keep the following order consistent in our models:-

CONSTANTS
has_one, has_many, belongs_to, has_and_belongs_to relations in dependency order
plug-ins initialization (acts_as_tree, acts_as_state_machine etc.)
validates_presence_of
validates_uniqueness_of
validates_numericality_of
validates_format_of
custom_validations
named_scopes grouped by related purposes
active record hooks (after_initialize, before_create, after_create, ...) in execution order in the format (after_initialize [:assign_default_state, :sanitize_content] )
protected
hook method implementations according to execution order
public
constructor
class methods in alphabetic order
other methods alphabetically or grouped if related
protected
methods alphabetically or grouped if related
private
self methods in alphabetic order or similar methods in a group
other methods in alphabetic order or similar methods in a group

Rule

No method gets code real estate over 20 lines. If needed, one/more private methods are used.

How is this helping us?

We are absolutely sure where to look for a method or where to write a new method.
The code base is really consistent.
The unit test methods also follow the same order, which makes managing the test suite easy.

More to come later this week. Stay tuned!

Sohan - on software artisanship