Category Archives: Ruby on Rails

This is for Ruby on Rails blogging

Russian doll caching for collections in Rails 4

A lot has been written about Russian Doll Caching in Rails 4 but surprisingly little about caching of collections.   One of the basic tenets is that you should use “touch: true” on your “belongs_to” relationships in order to update the “updated_at” time on the parent record and thus invalidate its cached copies.

An issue then arises when you have a relationship that isn’t a strict parent/child relationship.  As an example, let’s imagine this particular relationship structure:


 class Item < ActiveRecord::Base
   has_many :project_items, inverse_of: :item
 end
class Project < ActiveRecord::Base
  has_many :project_items, -> { order("project_items.position") }, inverse_of: :project
  has_many :items, through: :project_items
end
class ProjectItem < ActiveRecord::Base
  belongs_to :project, inverse_of: :project_items, touch: true
  belongs_to :item, inverse_of: :project_items
end

Items and Projects are then in a many-to-many relationship with a join table providing support for ordering of items within a project. Now, let’s imagine a simple view:

 <% cache @project do -%>
   <h1><%= @project.title %></h1>
   <% @project.project_items.each do |project_item| -%>
     <% cache project_item %>
       <p><%= project_item.item.description %></p>
     <% end -%>
   <% end -%>
 <% end -%>

The problem here is that if an item is updated it won’t change the updated_at date on the project_items that are associated with it, nor do we want it to.   But we need to invalidate both the main Project cache and the individual Item cache.  Simply changing the cache key on the second item to “project_item.item” doesn’t fix this as the Project still won’t be updated.

This is then a two-fold problem:

  1. What is the proper cache key for the outer @project cache?
  2. What is the proper cache key for the inner item cache?

It’s tempting on the @project to do something like this:

<% cache [@project,@project.items] %>

And that mostly works.  The problem is that the cache key will simply grow as items are added.  What might not be a problem with two or three items gets out of whack with 50 or 100.  I’ve seen cache keys that are a block of 10+ lines at 80 characters wide.  That’s inefficient.

We can come up with something simpler:

<% cache [@project,@project.items.maximum(:updated_at)] %>

That works.  If someone removes a ProjectItem, the project should be touched, and if a track is updated then the max updated_at should be changed.  I like this even better:

<% cache [@project,@project.items.count,@project.items.maximum(:updated_at)] %>

Now, the question is how best to do this.  That’s a little ugly.

I can actually change this in the relationship:

class Project < ActiveRecord::Base
  has_many :project_items, -> { order("project_items.position") }, inverse_of: :project
  has_many :items, through: :project_items do
    def cache_key
      [count(:updated_at),maximum(:updated_at)].map(&:to_i).join('-')
    end
  end
end

Now, our cache line is a little simpler:

<% cache [@project,@project.items] do %>

With that, it’ll get @project.items.cache_key and the cache will be invalidated if any item is updated.  The bonus is that the cache key is made up of only a few items and is much more manageable.  It’s also much more readable to humans, both in code and in the cache itself.

The inner cache is then simply:

<% cache [project_item,project_item.item] %>

That way any update to either the project_item or the item will invalidate the cache.  I found a gem that should add the cache_key for associations but it seems to not work with Rails 4.  It would be useful for someone to update it as this functionality is even better when the code doesn’t have to be specified each time.

The argument for being non-DRY is that some cache key schemes might be lighter-weight and work for some places.  The example here is another table that contains viewing logs for projects.  The table is basically write-only, never updated.  So I can just look at the record count or maximum id on the joined table to determine the cache key.  Etc.