Anonymizing Records in Ruby on Rails
Introduction
In the development of modern web applications, especially those handling sensitive user data, the decision to soft delete records rather than permanently remove them from the database is a common practice. This approach, often dictated by application design or business requirements, allows data to remain accessible for administrative purposes while hidden from the user interface. However, with the rising importance of data privacy laws and regulations globally, simply soft deleting records may not suffice to comply with legal standards. It becomes necessary to anonymize certain records that are soft deleted to protect user privacy effectively.
In response to these challenges, we will delve into the utilization of Rails model concerns as a sophisticated solution for adding anonymizing functionality to your models. This tutorial will guide you through the process of leveraging Rails concerns to implement a customizable and reusable anonymization strategy. By integrating this functionality directly into your models, you can ensure that your application not only adheres to privacy laws but also maintains a high standard of data integrity and user trust. Join us as we navigate the practical steps to enhance your Rails application with essential anonymization capabilities.
Setting Up: Creating a New Rails Application and Its Core Models
We'll begin by setting up a new project. Our application, named "Slacker", will exemplify how to integrate anonymizing functionality within a Rails application. To streamline our setup and focus on the essentials, we'll omit certain components that aren't directly relevant to our objectives.
We create our new Rails application in a terminal window:
rails new slacker --skip-jbuilder --skip-test
rails db:create
Next, we'll proceed to create the core models for our application: Account
, User
, and Message
. The User
model will belong to an Account
, establishing a direct relationship where each user is associated with a single account. Furthermore, we'll set up a one-to-many relationship between Users
and Messages
, allowing each user to have multiple posts.
Here are the commands to generate each model along with their respective attributes:
rails generate model Account email:string password:string
rails generate model User name:string account:references
rails generate model Message content:text user:references
rails db:migrate
After creating the models and migrating our database, the next step is to set up the associations between these models to reflect their relationships accurately.
class Account < ApplicationRecord
has_one :user dependent: :destroy
end
class User < ApplicationRecord
belongs_to :account
has_many :messages, dependent: :destroy
end
class Message < ApplicationRecord
belongs_to :user
end
Implementing a Basic Rails Model Concern
We aim to anonymize sensitive information in the User
and Account
models by setting their relevant fields (name
for User
and email
for Account
) to nil
when necessary.
To achieve this in a clean and reusable manner, we will employ a Rails concern. A concern allows us to encapsulate this shared behavior in a module that can be easily included in any model requiring anonymization capabilities. This approach not only keeps our code DRY (Don't Repeat Yourself) but also enhances the modularity and maintainability of our application.
One key aspect of this implementation is allowing each model to specify which of its columns are anonymizable. This flexibility is crucial for tailoring the anonymization process to the specific needs of each model, ensuring that we only anonymize the data that truly requires it.
To set the stage for this functionality, we will begin by creating a concern named Anonymizable
:
# app/models/concerns/anonymizable.rb
module Anonymizable
extend ActiveSupport::Concern
included do
class_attribute :anonymizable_columns
def anonymize!
anonymizable_columns.each do |column|
public_send("#{column}=".to_sym, nil)
end
save!
end
private
def anonymizable_columns
self.class.anonymizable_columns
end
end
class_methods do
def anonymizable(*columns)
self.anonymizable_columns ||= columns
end
end
end
In the initial version of our Anonymizable
concern, we've laid down a foundation for a flexible yet straightforward approach to data anonymization within our Rails application.
- Module Definition: The
Anonymizable
module extendsActiveSupport::Concern
, a Rails module that provides a structured way to enhance models with additional capabilities. This choice facilitates the inclusion of shared methods and logic across different models in a clean and maintainable manner. - Class Attribute: Within the
included
block, we define aclass_attribute
named:anonymizable_columns
. This attribute will store an array of symbols representing the columns each model wishes to anonymize. Using a class attribute allows each model to maintain its own list of anonymizable columns, providing the necessary customization for the anonymization process. - Anonymize! Method: The
anonymize!
instance method is the heart of this concern. When invoked on a model instance, it iterates over theanonymizable_columns
, setting each specified column's value tonil
. This effectively anonymizes the data by removing any identifiable information. The method concludes by saving the changes to the database withsave!
.
Understanding the included
Block
The included
block within an ActiveSupport::Concern
module is a special hook that Rails calls when the module is included in another class. This block is where we place code that we want to be executed in the context of the class that includes the module. In the case of our Anonymizable
concern, the included
block defines the :anonymizable_columns
class attribute and the anonymize!
instance method. This setup ensures that any model including the Anonymizable
module will automatically have these attributes and methods injected into its class context, enabling the anonymization functionality without additional boilerplate code.
The Role of class_methods
Block
The class_methods
block provided by ActiveSupport::Concern
offers a clean, organized way to add class methods to the including class. When you define methods within the class_methods
block of the concern, these methods become available on the class itself, not just on instances of the class. In our Anonymizable
concern, the class_methods
block is used to define the anonymizable
method. This method allows any model that includes the concern to specify which columns should be anonymizable by setting the anonymizable_columns
class attribute. This design pattern simplifies the process of extending class functionality across multiple models, maintaining a DRY approach while providing the necessary hooks for customization.
Now that we understand how our Rails concern works, let's include it in our models to enable the anonymization functionality. By incorporating the Anonymizable
concern into our User
and Account
models, we can specify which fields should be anonymized and easily anonymize records with a simple method call. Here’s how we can do it:
class User < ApplicationRecord
include Anonymizable
anonymizable :name
belongs_to :account
has_many :messages, dependent: :destroy
end
class Account < ApplicationRecord
include Anonymizable
anonymizable :email
has_one :user, dependent: :destroy
end
Let's test it in a sandbox console rails console --sandbox
:
# Create an example user and account
account = Account.create!(email: "jd@beatz.com")
user = User.create!(name: 'Jay Dilla', account:)
# Anonymize the user and account
user.anonymize!
user.account.anonymize!
# Check the anonymized fields
user.reload
user.name
# => nil
user.account.reload
user.account.email
# => nil
Enhancing Anonymization: Custom Placeholder Values
A specific business requirement has emerged: when displaying messages from users who have been anonymized, we want to replace their name
attribute, which would typically become nil
upon anonymization, with a placeholder text '(deactivated)'
. Conversely, for accounts, the email
attribute should still be anonymized to nil
. This requirement introduces a new layer of complexity: our Anonymizable
concern must now support not only the specification of which columns to anonymize but also allow for the customization of the anonymization value for each column.
To accommodate this, we've updated our Anonymizable concern with enhanced functionality:
module Anonymizable
extend ActiveSupport::Concern
included do
class_attribute :anonymizable_columns
def anonymize!
anonymizable_columns.to_h.each do |column, value|
public_send("#{column}=".to_sym, value)
end
save!
end
private
def anonymizable_columns
self.class.anonymizable_columns
end
end
class_methods do
def anonymizable(*columns)
self.anonymizable_columns ||= []
columns.flatten.each do |column|
case column
when Symbol, String
self.anonymizable_columns << [column.to_sym, nil]
when Hash
column.each do |key, value|
self.anonymizable_columns << [key.to_sym, value]
end
end
end
self.anonymizable_columns
end
end
end
The core logic of the Anonymizable concern has been expanded to allow for a more versatile anonymization process:
- Anonymizable Columns as a Hash: Previously,
anonymizable_columns
was conceived as an array of symbols representing the columns to be anonymized. We've evolved this approach by transforminganonymizable_columns
into an array of arrays (effectively a hash when converted usingto_h
), where each element consists of a column name and a corresponding anonymization value. This change provides the granularity needed to specify distinct anonymization values for different columns. - Enhanced Anonymize! Method: The
anonymize!
method has been modified to iterate overanonymizable_columns
, now expecting a hash map of columns to their respective anonymization values. This iteration allows us to set each specified column to its corresponding anonymization value, fulfilling the requirement to replace certain data with custom placeholders. - Flexible Class Method for Specifying Anonymization: The
class_methods
block within the concern now includes a more sophisticatedanonymizable
method. This method accepts arguments in various formats (symbols, strings, or hashes), allowing us to specify columns with default anonymization values (i.e.,nil
) or custom values as needed. This flexibility is key to accommodating our business case, providing the ability to specify that thename
column in theUser
model should be anonymized to'(deactivated)'
.
class User < ApplicationRecord
include Anonymizable
anonymizable name: '(deactivated)'
belongs_to :account
has_many :messages, dependent: :destroy
end
class Account < ApplicationRecord
include Anonymizable
anonymizable :email
has_one :user, dependent: :destroy
end
Let's test it in a sandbox console rails console --sandbox
:
# Create an example user and account
account = Account.create!(email: "jd@beatz.com")
user = User.create!(name: 'Jay Dilla', account:)
# Anonymize the user and account
user.anonymize!
user.account.anonymize!
# Check the anonymized fields
user.reload
user.name
# => "(deactivated)"
user.account.reload
user.account.email
# => nil
Advanced Anonymization: Tracking and Extending Anonymization Behavior
Building upon our foundational anonymization functionality, we will now introduce advanced requirements to further enhance how our application handles data anonymization. These new features include tracking the anonymization time, as well as providing a flexible mechanism to execute additional actions during the anonymization process. Specifically, we want to update a new status
column in the User
model to deleted
every time a User
record gets anonymized.
To align with these enhancements, our models will require the following updates in a Rails migration:
add_column :users, :anonymized_at, :datetime
add_column :accounts, :anonymized_at, :datetime
add_column :users, :status, :integer, null: false, default: 0
And we add the status
enum to the User
model:
class User < ApplicationRecord
include Anonymizable
anonymizable name: '(deactivated)'
enum status: %i[active deleted]
belongs_to :account
has_many :messages, dependent: :destroy
end
We'll make the following changes to our Anonymizable
concern:
module Anonymizable
extend ActiveSupport::Concern
included do
class_attribute :anonymizable_columns
def anonymize!
anonymizable_columns.to_h.merge(anonymized_at: Time.zone.now).each do |column, value|
public_send("#{column}=".to_sym, value)
end
yield self if block_given?
save!
end
def anonymized?
anonymized_at.present?
end
private
def anonymizable_columns
self.class.anonymizable_columns
end
end
class_methods do
def anonymizable(*columns)
self.anonymizable_columns ||= []
columns.flatten.each do |column|
case column
when Symbol, String
self.anonymizable_columns << [column.to_sym, nil]
when Hash
column.each do |key, value|
self.anonymizable_columns << [key.to_sym, value]
end
end
end
self.anonymizable_columns
end
end
end
The updated version of the Anonymizable
concern includes the following key enhancements:
- Tracking Anonymization Time: We introduce an
anonymized_at
attribute to our models, which records the exact time a record was anonymized. This attribute is updated within theanonymize!
method by merginganonymized_at: Time.zone.now
into theanonymizable_columns
hash before the anonymization process begins. - Yielding Self for Additional Actions: The
anonymize!
method now accepts a block, yieldingself
to it. This allows the caller to perform further actions on the record being anonymized, adding a layer of flexibility to the anonymization process. For instance, a model can update additional attributes or perform checks during anonymization. - Determining Anonymization Status: A new instance method,
anonymized?
, checks for the presence of theanonymized_at
attribute to determine if a record has been anonymized. This method provides a simple way to query the anonymization status of any record.
Let's test it in a sandbox console rails console --sandbox
:
# Create an example user and account
account = Account.create!(email: "jd@beatz.com")
user = User.create!(name: 'Jay Dilla', account:)
user.status
# => "active"
# Anonymize the user and account
user.anonymize! do |record|
record.status = :deleted # no need to save as our anonymize! calls save! at the end
end
user.account.anonymize!
# Check the anonymized fields
user.reload
user.name
# => "(deactivated)"
user.status
# => "deleted"
user.anonymized?
# => true
user.account.reload
user.account.email
# => nil
user.account.anonymized?
# => true
In conclusion, this tutorial showcases the remarkable flexibility and utility of Rails concerns in real-life applications, particularly in addressing complex requirements such as data anonymization. By methodically building and refining the Anonymizable
concern, we've illustrated how Rails developers can implement sophisticated features that highlight the importance of modular and reusable code in modern web development, enabling elegant solutions to complex challenges.