0032: Redshift Permissions Management Tooling
STATUS
Accepted
CONTEXT
The new data warehouse (aa-warehouse) is built using an
Infrastructure-as-Code design. All changes to the database and its
metadata can be performed via a PR to the data-warehouse
repository. This is
achieved via custom lambda-backed CDK resources. Database resources
that are created this way include databases, schemas, users, roles,
and grants.
We have found the permission granting process to be too coarse and difficult to use. The lambda is only designed to grant a role all privileges on a given subject object (schema, database). In practice, we find ourselves needing to grant more refined permissions (SELECT only for example), and permissions at the table level. Neither of which is supported in the current implementation. Additionally, we have run into intermittent race conditions in code pipeline deployments of the current grant implementation. All of this has resulted in several permissions granted manually and tracked in this github issue.
The objective of this ADR is to establish an Infrastructure-as-Code like solution for managing fine grained permission grants. As an added bonus some solutions may provide for schema migration tracking that is not currently available in our warehouse.
Considered Options
- Continue to manually provision grants.
- Enhance the CDK code to handle fine grained granting.
- DECISION Adopt Liquibase for managing grants.
DECISION
We will adopt option 3, implementing Liquibase as our database migration management tool. This will allow us to not only manage grants, but also handle future table management tasks.
Tools also considered:
CONSEQUENCES
Permissions and Users
There will be a new directory migrations in the root of the
data-warehouse repository that will house all liquibase resources.
Within this directory are the changelog files for each database.
data-warehouse/
├── README.md
├── ...
├── migrations
│ ├── changelog-adaction_analysis.yaml
│ ├── changelog-adaction_raw.yaml
│ ├── changelog-adgem_analysis.yaml
│ ├── changelog-adgem_raw.yaml
│ ├── changelog-cosmic_analysis.yaml
│ ├── changelog-cosmic_raw.yaml
│ ├── changelog-servicehub_analysis.yaml
│ └── changelog-servicehub_raw.yaml
├── ...
All permissions grants will be managed via the changelog for the applicable database.
Example grant changelog entries:
databaseChangeLog:
- changeSet:
id: 1
author: ronco
labels: raw-sql
context: example-context
comment:
changes:
- sql:
sql: "GRANT SELECT ON TABLE svv_table_info TO ROLE foobar"
- changeSet:
id: 2
author: other-dev
labels: raw-sql
context: example-context
comment: example-comment
changes:
- sql:
sql: "GRANT SELECT ON TABLE foo TO 'foobar'"
Example user creation changelog entries:
databaseChangeLog:
- changeSet:
id: 1
author: ronco
labels: raw-sql
context: example-context
comment:
changes:
- sql:
sql: "CREATE USER 'foo-user' WITH PASSWORD DISABLE;"
Tables and Views
One big benefit of using Liquibase is migration management for tables
and views. Currently, many of the non-dbt created tables in the data
warehouse are created via CREATE IF NOT EXIST statements in the
scripts that use them. Additionally, there is no mechanism for
modifying table structures, so columns are added manually in the
database, and the CREATE IF NOT EXIST statement in the script is
often left behind. Worse, the materialized views we have wrapped
around federated queries were generated by hand in a sql editor.
There will be a new directory migrations in the root of the
airflow-dags repository that will house all liquibase resources.
airflow-dags is the primary repo for all new data warehouse logic.
airflow-dags/
├── README.md
├── ...
├── migrations
│ ├── changelog-adaction_analysis.yaml
│ ├── changelog-adaction_raw.yaml
│ ├── changelog-adgem_analysis.yaml
│ ├── changelog-adgem_raw.yaml
│ ├── changelog-cosmic_analysis.yaml
│ ├── changelog-cosmic_raw.yaml
│ ├── changelog-servicehub_analysis.yaml
│ └── changelog-servicehub_raw.yaml
├── ...
Example of creating and modifying a table with liquibase changelog:
- changeSet:
id: 1
author: your.name
labels: example-label
context: example-context
comment: example-comment
changes:
- createTable:
tableName: person
columns:
- column:
name: id
type: int
autoIncrement: true
constraints:
primaryKey: true
nullable: false
- column:
name: name
type: varchar(50)
constraints:
nullable: false
- column:
name: address1
type: varchar(50)
- column:
name: address2
type: varchar(50)
- column:
name: city
type: varchar(30)
- changeSet:
id: 1
author: your.name
labels: example-label
context: example-context
comment: example-comment
changes:
- addColumn:
tableName: person
columns:
- column:
name: country
type: varchar(2)
Example of creating a materialized view via changelog sql statements:
- changeSet:
id: 3
author: ronco
labels: materialized_view
changes:
- sql:
sql: >
CREATE MATERIALIZED VIEW prod_mirror.app_campaign_blacklist
AS
select id, blacklisted_app_id, campaign_id, created_at, updated_at
from adgem_prod.app_campaign_blacklist
- sql:
sql: >
CREATE MATERIALIZED VIEW prod_mirror.app_campaign_whitelist
AS
select id, whitelisted_app_id, campaign_id, created_at, updated_at
from adgem_prod.app_campaign_whitelist
- sql:
sql: >
CREATE MATERIALIZED VIEW prod_mirror.target_devices
AS
select id, targeting_profile_id, logic, type, regex, created_at, updated_at, value
from adgem_prod.target_devices
- sql:
sql: >
CREATE MATERIALIZED VIEW prod_mirror.target_isps
AS select id, isp_id, targeting_profile_id, created_at, updated_at
from adgem_prod.target_isps
- sql:
sql: >
CREATE MATERIALIZED VIEW prod_mirror.isps
AS select id, isp_name, country_id
from adgem_prod.isps
rollback:
- sql:
sql: DROP MATERIALIZED VIEW IF EXISTS prod_mirror.app_campaign_blacklist
- sql:
sql: DROP MATERIALIZED VIEW IF EXISTS prod_mirror.app_campaign_whitelist
- sql:
sql: DROP MATERIALIZED VIEW IF EXISTS prod_mirror.target_devices
- sql:
sql: DROP MATERIALIZED VIEW IF EXISTS prod_mirror.target_isps
- sql:
sql: DROP MATERIALIZED VIEW IF EXISTS prod_mirror.isps
Changeset updates
Liquibase tracks applied changes in the databasechangelog table, and has a lock
in databasechangeloglock to ensure no more than one update is running at a given time.
To run un-applied changes you simply execute liquibase update in the
directory with your changelog.
Rollbacks
Liquibaser performs updates in a transaction, so if there is any issue with executing an update (syntax error perhaps), the entire update will be rolled back automatically.
If you've made a change successfully, but need to revert it, this is
possible as liquibase tracks all events in the databasechangelog
table. Generatlly it is prefered to "roll forward" and maintain the
record of the bad change. However, there may be scenarios (including
in CI/CD detailed below) where you need to manually rollback.
Changeset automation
We will set up a new Github action in the data-warehouse and
airflow-dags repositories to automatically apply liquibase changes
on merge to the main branch as part of our deploy process. When
making database updates, merely add your changeset to the changelog
for the appropriate database and then merge your PR. Github will take
care of running the migration.
Process
If there is a modification to the changeset file for a given database as part of the push, this action will initially attempt the update on the staging environment database. On failure the deploy will halt. If the deploy is successful in staging the action will perform the update in the production database. If this fails, the deploy will halt, and the changes that were already applied to the staging environment will be rolled back to keep the environments in sync.
The Action
Since we have distinct changelogs for each database, we need to detect and deploy changes independently for each db, and halt the deploy if any Liquibase error occurs on any db. Here is a visualization of the new deploy database action with a subset of the databases:
Using in dev
There are a few steps necessary to test/build your migrations in your dev schema.
Liquibase setup
You'll need to install Liquibase to start.
Additionally, we need two jar files to enable Liquibase to connect to Redshift. Run the following command to download the dependencies:
curl -L -o redshift-jdbc42-2.1.0.18.jar https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/2.1.0.18/redshift-jdbc42-2.1.0.18.jar
curl -L -o liquibase-redshift-4.30.0.jar https://github.com/liquibase/liquibase-redshift/releases/download/v4.30.0/liquibase-redshift-4.30.0.jar
Once you have the jars you'll need to set some environment variables:
* LIQUIBASE_COMMAND_URL -> The jdbc connect string for the database.
In our case this is
jdbc:redshift://aa-warehouse.ct5xde6dfdjh.us-east-2.redshift.amazonaws.com:5439/de_sandbox
* LIQUIBASE_COMMAND_USERNAME -> Username for connecting to the db.
Should be something along the lines of IAMA:rwhite@adaction.com
* LIQUIBASE_COMMAND_PASSWORD -> Temporary password for your user.
The username and password can be obtained by running get-cluster-credentials.
Personal changelog
Create a new changelog for your user schema in the migrations
directory. For example migrations/changelog-rwhite.yaml. It's a
wise idea to start with a simple migration to test functionality.
Here is a simple starter changelog:
databaseChangeLog:
- changeSet:
id: 1
author: ronco
labels: test
context: test
comment: test comment
changes:
- createTable:
tableName: person
schemaName: rwhite
columns:
- column:
name: name
type: varchar(50)
constraints:
nullable: false
- column:
name: address1
type: varchar(50)
- column:
name: address2
type: varchar(50)
- column:
name: city
type: varchar(30)
This will create a dummy person table in your schema. Be sure to
update the schema name from rwhite to your own schema.
Run the dev command
Once you have your dependencies, your environment, and your personal changelog you're ready to run liquibase:
liquibase update --changelog-file migrations/changelog-rwhite.yaml --classpath redshift-jdbc42-2.1.0.18.jar:liquibase-redshift-4.30.0.jar --liquibase-schema-name rwhite
Make sure to update to the correct changelog, schema, and path to jars.
Back populating
We will be managing all of these database aspects with liquibase going forward, but what will we do about database manipulation that has already occured, either through cdk, or manual application?
The easiest path forward is to recreate all idempotent operations (grants) as liquibase changesets. We will leave all pre-existing non-idempotent changes (user creations) alone.
Risks
All three of the considered tools are intended for managing DDL, not permissions. They all have support for arbitrary sql instructions, so managing grants and ussers is possible. Nevertheless, we are bending these tools outside of their standard use.
Some database migrations are time and CPU intensive. Firing migrations that add a column to a very large table within a CI/CD event could cause database performance issues.
NOTES
References
- PR #47: docs: Add Redshift permissions management tooling ADR
- PR #127: docs: backfill PR reference links for existing ADRs
Original Author
Ron White
Approval date
September 8, 2025
Approved by
Ben Giese Victor Gonzalez