Skip to content

0054: Iteration ID Attribution for Conversion Tracking

STATUS

Accepted

CONTEXT

As part of the Automated Conversion Ingestion initiative, we need to associate conversions (goal complete events) with specific AdSuite iterations. This enables tracking conversion performance per iteration.

The current planned workflow for Solutions when creating a new iteration involves:

  1. Solutions creates iteration in AdSuite
  2. Solutions updates Tune with goal configuration (including copying goal_reference_id from AdSuite)
  3. Solutions copies iteration_id from AdSuite to the AdGem campaign
  4. When a click occurs, iteration_id is written to the click event
  5. Conversions inherit iteration_id from their associated click

This ADR evaluates whether step 3 (copying iteration_id to AdGem) is necessary, or whether the iteration can be derived at analytics time.

Decision Drivers

  • AdGem is legacy: We are actively moving away from the AdGem dashboard. Medium-term plans include AdSuite-AdGem Integration; long-term plans move to the Offer API entirely.
  • Manual step overhead: Each new iteration requires Solutions to update multiple systems. Reducing manual steps reduces human error.
  • Existing architecture: Application data from other systems is already synced to Redshift via the federated query pattern (used by offer-api and others) -- this established pattern can be extended to AdSuite.
  • Future-proofing: The solution should work regardless of where offer/campaign configuration lives.

Considered Options

  • Option 1: Explicit Iteration ID -- Store iteration_id on the AdGem campaign, copy it to click events at click time
  • Option 2: Time-Based Iteration Derivation -- Derive iteration_id at DBT transformation time by matching click timestamps to iteration time ranges

DECISION

We will use Option 2: Time-Based Iteration Derivation.

Instead of storing iteration_id on the AdGem campaign and copying it to click events, we will derive the iteration at DBT transformation time by matching click timestamps to iteration time ranges from AdSuite.

How It Works

  1. Solutions creates iteration in AdSuite (timestamp recorded)
  2. Solutions updates Tune with goal configuration
  3. When a click occurs, the click event is created (no iteration_id stored)
  4. AdSuite iteration data is available in Redshift via the federated query pattern (see Implementation Requirements)
  5. The Goal Complete Events DBT model derives iteration_id by:
  6. Looking up the associated click event (via click identifier)
  7. Getting the click's timestamp
  8. Finding which iteration was active at that click time

DBT Join Logic

For each click, find the iteration where:

iteration.created_at <= click.timestamp < next_iteration.created_at

The join path connects AdGem clicks to AdSuite iterations via: - Click events have a campaign_id - AdGem campaigns have an adaction_id - AdSuite strategies have a tune_offer_id - Join where adaction_id = tune_offer_id

Example SQL (gold table goal_complete_events_attributed):

WITH iterations_with_end_time AS (
  SELECT
    iteration_id,
    strategy_id,
    created_at,
    LEAD(created_at) OVER (
      PARTITION BY strategy_id
      ORDER BY created_at
    ) AS next_iteration_created_at
  FROM iterations
)

SELECT
  gce.*,
  c.timestamp as click_timestamp,
  i.iteration_id
FROM goal_complete_events gce  -- reads from silver table
JOIN clicks c ON gce.click_id = c.click_id
JOIN campaigns camp ON c.campaign_id = camp.id
JOIN strategies s ON camp.adaction_id = s.tune_offer_id
JOIN iterations_with_end_time i ON
  i.strategy_id = s.id
  AND c.timestamp >= i.created_at
  AND c.timestamp < COALESCE(i.next_iteration_created_at, CURRENT_TIMESTAMP)

Rationale

  1. Avoids investment in legacy system: Adding fields and workflows to AdGem contradicts our direction of moving away from it.

  2. Reduces manual overhead: Eliminating the "copy iteration_id to AdGem" step reduces opportunities for human error. The failure mode for Option 1 (forgetting a routine step) is more likely than the failure mode for Option 2 (updating systems out of order).

  3. Leverages existing architecture pattern: The federated query pattern for syncing application data to Redshift is already established (used by offer-api and others). While this requires new infrastructure for AdSuite specifically, the pattern itself is well-understood.

  4. Future-proof: When we move to Offer API, the iteration derivation logic remains unchanged. The source of truth for iteration timing stays in AdSuite.

CONSEQUENCES

Positive Outcomes

  • Solutions workflow simplified (one fewer manual step per iteration)
  • No changes needed to AdGem
  • Aligns with long-term architecture direction (moving away from AdGem dashboard)
  • Source of truth for iteration timing lives in AdSuite (where iterations are created)

Negative Outcomes

  • Debugging iteration attribution requires checking multiple tables (click timestamp, iteration time ranges, materialized view refresh status)

Risks

Risk Mitigation
If the materialized view hasn't refreshed with a new iteration before the DBT model runs, conversions could be attributed to the previous iteration instead of the new one Run the DBT model in the same DAG as the materialized view refresh, sequenced after it; this ensures the refresh always completes before attribution runs
Retroactive edits to iteration created_at could re-attribute historical conversions The iteration created_at field should never be edited after creation; this is a sealed timestamp that represents when the iteration began
A/B testing of iterations would not work with time-based derivation (concurrent iterations would have overlapping time ranges) Acceptable for now; A/B testing is not currently supported. When needed, we expect to be on the Offer API where we can store iteration_id explicitly
Clicks that occur while Solutions is updating systems (AdSuite, Tune, AdGem) could be attributed to the wrong iteration Accepted risk; this failure mode is less likely than Option 1's failure mode of forgetting a routine step

NOTES

Implementation Requirements

  1. Federated query setup: Add AdSuite iteration data to Redshift via external schema and materialized view
  2. Required fields: iteration_id, strategy_id, created_at
  3. Liquibase creates the external schema (adsuite_prod) and materialized view (adsuite_mirror)
  4. Airflow DAG refreshes the materialized view on a schedule

  5. DBT models: Use a silver/gold table approach

  6. Silver table goal_complete_events (incremental): Pulls goal complete events from the large events table. This gets the performance benefits of incremental processing on the expensive read.
  7. Gold table goal_complete_events_attributed (non-incremental): Reads from the silver table and joins in iteration_id via the time-based logic:
    • Join to click event (via click identifier)
    • Join click timestamp to iteration time range
    • The full-table-scan only hits the smaller silver table, not the events table.
  8. This approach ensures that if tune_offer_id is populated late, subsequent runs will correctly attribute all conversions, while still getting incremental benefits on the expensive events table read.

Comparison of Approaches

Aspect Option 1 (Explicit) Option 2 (Time-Based)
Manual steps per iteration 2 (Tune + AdGem) 1 (Tune only)
Failure mode Forget to update AdGem Update systems out of order
Data accuracy Stored at event time Derived from timestamps
Implementation complexity Store ID on click event DBT join on timestamps
Touches legacy AdGem Yes No
Debugging Look at click record Check multiple tables

References

Original Author

Dylan Kreth

Approval date

Approved by

Appendix

Data Flow Diagram

flowchart TD A[events table] --> B[goal_complete_events<br/>silver, incremental] B --> C[goal_complete_events_attributed<br/>gold, non-incremental] D[Click Events Table] --> |join| C E[AdSuite Iterations<br/>via federated query] --> |join| C

Note: Aggregated conversion metrics are sent back to AdSuite via API. This is not covered in this ADR because the return flow is the same regardless of how iteration_id is determined (explicit storage vs time-based derivation).