Chores - What are they good for

Written by Jerrod Blavos

Data transformations performed on production data are inherently dangerous. Don't be tempted to use migrations to make them.

A disturbing pattern I’ve seen over the years is making changes to production data as part of a Rails migration. For example, when adding a column to an existing table, code is added to generate values to fill the column.

As part of a migration in Rails, these transformations are problematic for several reasons:

In addition to basic issues with transformations as migrations, UserTesting faces other common challenges. Our infrastructure is built on Amazon OpsWorks and our logs are not stored on the servers which makes checking success or investigating failure of a migration a challenge.

What are the common alternatives?

The most common alternative we’ve encountered is to pull transformations out of a migration and put them into a rake task. This is a natural next step once the realization occurs that migrations are not the best place to make changes. This is a commendable move as rake tasks do alleviate much of the pain because:

Even with these resolved, the largest issue with testability remains. Lack of insight into the outcome of the rake task also remains when dealing with cloud based infrastructure.

Our solution: Chores!

Chores are essentially Plain Old Ruby Objects (PORO’s) with a key behavior added: targeted logging which is uploaded to an Amazon S3 Bucket.

Our chores are most commonly triggered from a rake task but they can also be used within the app via the UI or an API endpoint. We sometimes run them in Sidekiq but they could be utilized in Delayed Job or similar backgrounding library.

class Chores::ClientAccounts
  include Chores::Logging

  def set_owners_as_client_or_tester!
    with_logging do
      # anything logged here is appended to a logfile dedicated to each execution
      non_client_or_tester_users.each do |user|
        if is_a_client?(user)
          log_info "Toggling #{user.id} as a client\n"
          user.update_attribute(:client, true)
        else
          log_info "Toggling #{user.id} as a tester\n"
          user.update_attribute(:tester, true)
        end
      end
    end
  end

  private

  def is_a_client?(user)
    user.accounts.count > 1 ||
      user.accounts_owned.any? { |ac| ac.subscriptions.count > 0 } ||
      user.created_sessions.count > 0
  end

end

» See more of our Chores::Logging Implementation.

The chores allow for logging anything you want and the log is sent to an S3 bucket for inspection upon completion. We name the log after the chore’s class & method called along with the date and time of execution.

How do we use chores?

Our most common use case is deleting or altering data/records. When we do this, we write out what is altered as YAML into the log in the event we need to undo (it happens, trust me). YAML files are easy to chop up and restore if needed, which has a nice side effect of giving you data to use when developing and testing your chore. Win!

We do our best to treat our production environment as a black box: we discourage doing anything with the enviroment manually and this includes using SSH to go into a server for the purposes of running rake tasks or chores. Instead our chores are triggered either from Amazon Opsworks or via our internal ChatOps tool we call “deploder”.

How do you manage your data transformations? What challenges do you encounter and how do you overcome them? Let me know @jerrodblavos.