A disturbing pattern I’ve seen over the years is making changes to production data as part of a Rails migration. For example, when adding a column to an existing table, code is added to generate values to fill the column.
As part of a migration in Rails, these transformations are problematic for several reasons:
- A migration can fail due to unforeseen situations such as data validation, connectivity, or file system issues.
- Changes that occur as part of a migration aren’t re-runnable if needed.
- When the work needs to be undone for any reason, our only course is to add a new migration.
- Accessing ActiveRecord in migrations can cause issues with developer setup.
- Code run within migrations is commonly assumed to only need to be to run once, leading many developers to avoid writing tests.
- Migration transformations become coupled to a specific state of the schema.
In addition to basic issues with transformations as migrations, UserTesting faces other common challenges. Our infrastructure is built on Amazon OpsWorks and our logs are not stored on the servers which makes checking success or investigating failure of a migration a challenge.
What are the common alternatives?
The most common alternative we’ve encountered is to pull transformations out of a migration and put them into a rake task. This is a natural next step once the realization occurs that migrations are not the best place to make changes. This is a commendable move as rake tasks do alleviate much of the pain because:
- they do not affect the success or failure of a migration.
- they can run many times, assuming they were written allow this.
- they don’t cause issues with setting up new databases.
- they are not tied to the schema.
- they don’t need a new migration to change or undo their outcome.
Even with these resolved, the largest issue with testability remains. Lack of insight into the outcome of the rake task also remains when dealing with cloud based infrastructure.
Our solution: Chores!
Chores are essentially Plain Old Ruby Objects (PORO’s) with a key behavior added: targeted logging which is uploaded to an Amazon S3 Bucket.
Our chores are most commonly triggered from a rake task but they can also be used within the app via the UI or an API endpoint. We sometimes run them in Sidekiq but they could be utilized in Delayed Job or similar backgrounding library.
» See more of our Chores::Logging Implementation.
The chores allow for logging anything you want and the log is sent to an S3 bucket for inspection upon completion. We name the log after the chore’s class & method called along with the date and time of execution.
How do we use chores?
Our most common use case is deleting or altering data/records. When we do this, we write out what is altered as YAML into the log in the event we need to undo (it happens, trust me). YAML files are easy to chop up and restore if needed, which has a nice side effect of giving you data to use when developing and testing your chore. Win!
We do our best to treat our production environment as a black box: we discourage doing anything with the enviroment manually and this includes using SSH to go into a server for the purposes of running rake tasks or chores. Instead our chores are triggered either from Amazon Opsworks or via our internal ChatOps tool we call “deploder”.
How do you manage your data transformations? What challenges do you encounter and how do you overcome them? Let me know @jerrodblavos.