Tithe.ly Engineering

Caching and where it can go wrong

by Thomas Reeves on February 6, 2024

Rails Caching is a powerful tool to speed up your application. It works by caching the result and allowing the next request to skip the work of generating the result. This can be a huge performance boost, but it can also cause problems if not used correctly. For Sites, we use Fragment Caching, SQL Caching, and Low-Level Caching. We also have a few custom caching solutions that we use. In this post, we will go over how caching works, how to test caching, and some of the problems we have encountered.

Types of caching we use

Fragment Caching

    <% @products.each do |product| %>
      <% cache product do %>
        <%= render product %>
      <% end %>
    <% end %>

The above example shows fragment caching when a request hits the page Rails will write a new cache key, and serve the cached result. If the product is updated the cache key will be invalidated and the next request will generate a new cache key. This is the most common type of caching we use. We use this for things like events, sermons RSS feeds, and news posts. The cache key contains a tree digest and a cache version. The digest will invalidate on html changes, and the cache version will invalidate record changes.

Russian Doll Caching

In some situations, you may want to cache within a cache where you use a cache a list of items and the individual items as well. This is called Russian Doll Caching. <% cache @products do %> <% @products.each do |product| %> <% cache product.item do %> <%= render product.item %> <% end %> <% end %> <% end %> Russian Doll Caching however can cause stale data, especially if you update an item but not a product. Luckily Rails provides a solution by using the touch: true. When adding this to the item model it will update the parent model updated_at when updated. class Item < ApplicationRecord belongs_to :product, touch: true end

SQL Caching

SQL Caching is a simple way to cache the result of a query. This is useful for queries that are used often, but don’t change often.

    def index
      @products = Product.all
    end

The second time this query is run it will be cached and the result will be returned. This is useful for things like the home page or a list of products.

Low Level Caching

Low level caching is writing directly to the cache store like so.

  Rails.cache.write('key', 'value') do
    # expensive operation
  end

This can be useful in situations where you want to cache an expensive API operation you know never changes. Sites’ uses this to cache google fonts.

Problems we encountered

Problem 1. Memcache was endlessly writing and invalidating on deploys.

How we found the problem.

When deploying to production we noticed that memcache was writing and invalidating endlessly. This was causing the site to be slow and unresponsive. We noticed this by looking at memcached stats. Warmup was taking a long time, and would take 2hrs to complete. The issue presented when we attempted to upgrade from Rails 4.2 to Rails 6.1. We noticed that the issue was not present in Rails 4.2. However on rollback the issue was still present. This lead us to believe that the issue was not with Rails, but with our caching implementation. Each re-deploy would cause the cache to be invalidated and rewritten. This was causing the site to be slow and unresponsive, and the same 2 hour window for warmup.

When testing locally we noticed that the issue was not present. However on deploys to QA load times were slow. Using our APM tracker Skylight we determined that Block Layouts were extremely slow.

From that information I determined the issue lied in code that looked similar to this

  def self.template_cache_key(template)
    @vc ||= ApplicationController.new.view_context
    MUTEX_FOR_BLOCK_CACHE.synchronize do
      path = "template/types/#{template.template_type}/layouts/_#{template.layout}"
      @vc.instance_variable_set(:@virtual_path, path)
      @vc.send(:fragment_name_with_digest, nil, path).join("/")
    end
  end

This code uses a Mutex to avoid multiple writes from multiple threads, this prevents issues from the multiple requests trying to all write a cache key. It works by locking the code call to the Mutex. Then the code generates a path and a digest to create a key for the cache.

Using byebug a debugging tool I was able to check what the method was returning, which was "templates/types/events/layouts/_events_1/. The path is correct, but no digest was being returned. This was causing the cache to be invalidated and rewritten on each deploy.

From investigating the issue was we were calling fragment_name_with_digest which is an internal rails method. In Rails 4.2 the method was

  def fragment_name_with_digest(name) #:nodoc:
    if @virtual_path
      names  = Array(name.is_a?(Hash) ? controller.url_for(name).split("://").last : name)
      digest = Digestor.digest name: @virtual_path, finder: lookup_context, dependencies: view_cache_dependencies

      [ *names, digest ]
    else
      name
    end
  end

However, in Rails 6.2 the method became

# File actionview/lib/action_view/helpers/cache_helper.rb, line 227
  def fragment_name_with_digest(name, digest_path)
    name = controller.url_for(name).split("://").last if name.is_a?(Hash)

    if @current_template&.virtual_path || digest_path
      digest_path ||= digest_path_from_template(@current_template)
      [ digest_path, name ]
    else
      name
    end
  end

As you can see the 4.2 method contained a Digestor.digest` call, which was removed in 6.2. This was causing the digest to be nil.

How we fixed the problem.

The fix invloved removing the call to fragment_name_with_digest and replacing it with a call to Digestor.digest`, and implementing the join that fragment_name_with_digest used to perform within our code. This fixed the issue and the cache was no longer being invalidated on deploys.

  def self.template_cache_key(template)
    @vc ||= ApplicationController.new.view_context
    MUTEX_FOR_BLOCK_CACHE.synchronize do
      name = "template/types/#{template.template_type}/layouts/_#{template.layout}"
      digest = ActionView::Digestor.digest(name: name, finder: @vc.lookup_context, dependencies: @vc.view_cache_dependencies)
      [ name, digest ].join("/")
    end
  end

Problem 2. Images were showing up as broken, after AWS migration.

After we performed a migration from Digital Ocean to AWS. On NewsPosts and Blocks the files were showing up as not found. This caused worry with our customers caused it appears as if files were being deleted on their sites. After investigating locally and not being able to replicate it I determined it was likely a caching issue, after making an update to the NewsPost the files would show up.

How we found the problem.

First I checked if the error presented with no caching, it did not. Second I turned caching on locally and attempted to replicate the issue, I was able to. After checking the cache keys for Blocks and NewsPosts I determined that the cache keys were not being invalidated when their associated files were created/updated/deleted. This was causing the cache to be stale and the files to not be found. Quick solution I had was to check the File model and add touch: true. Much like discussed earlier with russian doll caching. However, the touch method was already present. I checked the code, and confirmed that updating the file updates it’s updated_at. Stumped, I realized the only thing changed was the Rails Version. The Migration included a Rails upgrade from 4.2 to 6.2. After doing some sleuthing I was led to this issue being tracked. https://github.com/rails/rails/issues/26726, which was nested_attributes not updating the parent model, the simple fix of this was to add inverse_of: :news_post to the NewsPost model. The fix works by being explicit and letting rails know exactly which model should be updated since it makes the association bidirectional.

However once we fixed the issue we still ran into the issue of the stale cache now live on a bunch of sites. Ideally we do not want to have to ask our customers to update their NewsPosts and Blocks. So I added an expires_in to the fragment cache of blocks so that the cache would expire immediately, and would auto-expire every 12.hours.

How to Avoid this problem

Test any custom caching.

When writing any custom caching solution it is important to write test cases for them. For the block layouts, we did not have any test cases for the cache_key generated, so we did not catch the regression when it happened.

Only cache what needs to be, and be explicit.

When caching, it is important to only cache what needs to be cached. If you cache too much you can run into the issue of stale data.

Do not re-invent the wheel

Rails provides a lot of caching solutions out of the box. It is important to use these solutions when possible. They are well tested and are used by a lot of people. So when an issue arises it is likely that someone else has run into the issue and has a solution.

Utilize extra caching features

Rails provides expires_in and race_condition_ttl to help with caching. expires_in will expire the cache after a certain amount of time, and race_condition_ttl will allow stale data to be read while the cache is being written.

  Rails.cache.fetch("key", expires_in: 5.minutes, race_condition_ttl: 10.seconds) do
    # Expensive or time-consuming operation
  end

race_condition_ttl is useful for when you have a cache that is hit extremely hard causing a dogpile effect where 100 requests are all trying to write the cache at the same time. There are some performance considerations so it should only be used as a tool, not a default.

Conclusion

Caching can be a powerful tool, but there are tradeoffs. It is important to test your caching and be explicit. If care is not taken, you can run into the issue of stale data, and slow load times.

Like the old saying goes

There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.