Skip to content

0008: Sort Data Delivery to Adgem

STATUS

Accepted

CONTEXT

Earnings Per Click (EPC) are both reported to publishers, and can be used for sorting of offers in the offerwall and the offer-api. There are other potential metrics we would like to explore sorting with, RPM for example. This drives the current interest in campaign performance metrics.

Historically, the values for EPC were calculated via a cron running in the new_dashboard against the legacy Adgem Redshift events. One shortcoming of this solution was the lack of visibility to goal level stats in the legacy events. With the advent of the new AdAction Events, we have an opportunity to calculate these stats down to the goal level. We would also like to make use of the new data infrastructure, using dbt and airflow to run these calculations rather than a cron.

This data will be owned by the data system, and we don't want production code loading directly from our Redshift data warehouse. This presents the problem addressed in this ADR, namely, how we deliver the data calculated in airflow to the Adgem system(s).

Considered Options

  1. Deliver the data via S3 file dumps
  2. Deliver the data via a new API
  3. Provide the data via a new microservice

Option 1. S3 delivery

In this scenario the data backend airflow job would drop the data into an S3 bucket and the Adgem service would pick up the data from the S3 bucket via a cron job.

  • Pros
  • Simple to implement on the data side. This is a simple UNLOAD operation in Redshift
  • Cons
  • The parsing of the multitude of UNLOAD files would not be trivial in Adgem
  • The timing of each operation would be independent and updating the cadence would require work in both systems.

Option 2. Deliver via API

In this scenario, the data system running via airflow would have a regularly scheduled task (cadence to be determined as an operational concern) to call a new API endpoint with the new sort metrics which would be used by adgem. Both the legacy system (new_dashboard) and the offer-api would need this data.

Option 2a. Dashboard makes API call to update Offer-API

In this scenario, the new metrics are delivered to Offer-API via the dashboard. The data system would have no direct interaction with Offer-API. Stats would be delivered to Offer-API over existing APIs from the dash to Offer-API

  • Pros
  • Timely delivery of data.
  • Only one new API to build, one new API to call.
  • Dashboard is source of truth for production campaign meta-data.
  • Cons
  • None identified.
Option 2b. Data system calls API on both systems

In this scenario, both legacy adgem and offer-api create new API endpoints that the data system calls to notify of metric updates.

  • Pros
  • Timely delivery of data.
  • Separation of concerns between dashboard and Offer-API
  • Cons
  • Two new APIs to build.
  • Added complexity for data side.

Option 3. Create new microservice to house this data

In this scenario, Adgem (legacy and Offer-API) would pull the metrics on demand from a new microservice. This could be a service with its own datastore that is updated by the data process, or a front on top of the Redshift/dbt models with heavy caching.

  • Pros
  • One source of truth of this data.
  • Adgem systems can pull on demand.
  • Cons
  • We are not proficient as an organization with microservices.
  • Both legacy and Offer-API would need updates to call this service.
  • Creates new infrastructure to maintain.

DECISION

We will adopt Option 2a, creating a new API Endpoint in the dashboard and calling it from an airflow task in the data account.

CONSEQUENCES

SEQUENCE

The following illustrates the sequence flow for EPC and stat updates:

sequenceDiagram Airflow->>dbt: triggers epc model dbt->>Airflow: success! Airflow->>Adgem Dashboard API: POST batch Adgem Dashboard API-->>Offer API: PUT updates

Network

The following diagram illustrates the network architecture for this system:

flowchart LR subgraph dataaccount[Data AWS Account] airflow[Airflow] end subgraph adgemaccount[Legacy Adgem Account] ipdb[IP Whitelist] ndb[New Dashboard API] end subgraph offerapiaccount[Offer API Account] offerapi[Offer-API] end airflow--POST-->ipdb ipdb-->ndb ndb--PUT-->offerapi

Security Measures for API Implementation

1. IP Whitelisting

Description: IP Whitelisting restricts access to the API to specified IP addresses or IP ranges. Thus, reduce the risk of unauthorized access and protect sensitive data.

Implementation Steps: - Whitelist IP Addresses: Configure IP whitelisting to allow access only from trusted IP addresses, including those from the Data AWS Account (Airflow) and. - Update Whitelists Regularly: Keep IP addresses current to reflect changes in infrastructure.

Pros: - Limits access to known entities, reducing the attack surface. - Simple and effective at preventing unauthorized access.

Cons: - Requires ongoing maintenance to update IP lists. In this case we only are going to whitelist data engineering AWS account, which should have a very stable IP. Therefore is not a big Cons.

2. Sanctum Authentication

Description: Sanctum provides API token-based authentication for securing API access.

Implementation Steps: - API Token Generation: Use Sanctum to generate and manage API tokens. Each API request must include a valid token. - Token Validation: Validate tokens on every request to ensure the requestor is authorized. - Token Expiry and Rotation: Implement policies for token expiry and regular regeneration to enhance security (optional).

Pros: - Adds an additional layer of security by verifying requestor identity. - Provides control over access permissions and scopes.

Cons: - Requires management of authentication tokens. - Adds complexity to the API interaction process.

Dashboard API

For the new API we will use the following URL and payload to update many campaign and goals simultaneously:

POST /v1/campaigns/metrics

{
  "campaigns": [
    {
      "id": 123,
      "epc": 0.5,
      "rpm": 0.7,
      "goals": [
        {
          "id": 12323
          "epc": 12321.123
        }
      ]
    },
    ...
  ]
}

Risks

  • If there are any breakdowns in the connection between data and new_dashboard, or new_dashboard and Offer-API, the Offer-API could be using stale metrics.
  • In this simple implementation, traffic goes across the open internet using a token and IP whitelisting for security. This is consistent with previous internal API solutions. Private VPC peering with Security Group restrictions would be a more robust solution.

NOTES

References

Original Author

Ron White Daniel Ballesteros Maria Cornejo

Approval date

Approved by

Appendix