0040: Enable Cognito authentication in the Targeted API
STATUS
Accepted
CONTEXT
Authentication across Adgem services currently functions as separate entities, with each project managing its own users, access tokens, and client credentials. While this works for smaller architectures, maintainability decreases significantly as Adgem grows and new interconnected projects emerge.
The Targeted API (TA) currently retrieves offers via the Offer API (OA). This means the TA must store and manage its own access token as well as the OA token. If a new service is introduced, the number of tokens to manage increases, escalating complexity. A good example from it is that the Targeted API now uses the Player API to suppress in-progress and completed offers, while the ecosystem is growing managing authentication between the different services can be challenging. In other words maintanence overhead is directly proportional to the number of services.
This ADR focuses on enabling Cognito authentication in the Targeted API as a first step towards centralizing authentication across all Adgem services. To address this we decided to use the client credentials flow which is suitable for machine-to-machine authentication. This means that the service will no longer manage its own tokens but will instead rely on Cognito for authentication and authorization.
The discussion has been started in the following ADR
Considered Options
The following options rely on Amazon Cognito as the centralized authentication service. Once a user (in this case, a service) has been issued a token, two main options exist for validating the token and authorizing access to protected resources:
- Using an API Gateway Cognito Authorizer
- Using the aws-jwt-verify library, there are several ways to implement this package; however, one option could be creating a lambda function, which can be used for authorizing requests.
1. Perimeter Validation (API Gateway Offloading)
The API Gateway acts as a "Smart Firewall." It intercepts the request, extracts the token, and verifies using Cognito. If the token is invalid or expired, the request is rejected instantly. The backend is never aware that a call attempt was made.
Pros:
- Resource Protection: Prevents application-layer Denial of Service (DDoS) attacks. The backend does not spend a single CPU cycle processing invalid credentials.
- Decoupling: If you switch from Cognito to Auth0 or another OIDC provider in the future, you only reconfigure the Gateway; the backend code remains untouched.
- Standardization: Guarantees that all services behind the Gateway share the same security level without relying on every developer correctly implementing the auth library.
- Simplicity: The backend service code remains clean and focused on business logic without the overhead of authentication code.
Cons:
- Vendor Lock-in: Creates a strong dependency on the AWS API Gateway infrastructure.
- Blind Trust: The backend assumes that if the request arrived, it is secure. If an attacker were to bypass the Gateway and reach the backend server directly (via the internal network), it would be unprotected (unless strict VPC policies are used, like security groups).
- Limited Context: The backend has no access to token claims (scopes, roles, IDs) unless the Gateway is configured to forward them.
Architecture: M2M Client -> API Gateway (Validates) -> Backend (Trusts)
The diagram above illustrates the flow of a request from a machine-to-machine (M2M) client to the backend service through the API Gateway. The API Gateway is responsible for validating the token before forwarding the request to the backend. So the goal is treating the API Gateway as a "Smart Pipe" that offloads the security responsibilities from the backend service which will act as a "Dumb Endpoint" .
2: Backend Validation (Lambda Verifier)
The backend service accepts the raw request and assumes full responsibility for security. It extracts the Authorization header, downloads the public keys (JWKS) from Cognito, and mathematically verifies the token signature using a code library (aws-jwt-verify).
Using this approach, the backend service is self-sufficient and does not rely on any external components for security, also it allows for checking claims and scopes to make fine-grained authorization decisions, like retrieving a custom claim from the received token, granting the ability to get some useful context from the JWT (for example a token could contain the app_id or pub_id from the requester), this enriches the authorization process and allows the backend to make more informed access control decisions.
Pros:
- Total Portability: The code remains secure regardless of where it runs (AWS, Azure, On-premise, or a developer's laptop).
- Rich Context: By decoding the token within the code, there is immediate access to "claims" (scopes, roles, IDs) for making complex business decisions (e.g., "This token has a read scope, but is attempting to access an admin route... error 403").
- Security in Depth: The backend protects itself regardless of the network perimeter.
- No Vendor Lock-in: The backend service is not tied to an API Gateway which allows for greater flexibility in deployment.
Cons:
- Latency and Cost: The backend must initialize, load libraries, and perform cryptographic processing for every request, including malicious ones or junk traffic.
- Development Complexity: Requires maintaining JWT libraries and validation logic in every language/framework used.
- Inconsistent Security: Relies on every developer to correctly implement the auth library, leading to potential vulnerabilities if done incorrectly.
Architecture: M2M Client -> Backend (Validates & Executes)
The diagram above illustrates the flow of a request from a machine-to-machine (M2M) client directly to the backend service. The backend service is responsible for validating the token and executing the business logic. The philosophy behind this approach is to not rely on any external components for security, ensuring that the backend service is self-sufficient in handling authentication and authorization (Zero Trust).
3: Hybrid Architecture (Defense in Depth)
This approach combines the strengths of both previous options. The API Gateway acts as the first line of defense (Perimeter Security), performing coarse-grained validation to reject invalid signatures or expired tokens immediately. Valid requests are forwarded to the Backend, which utilizes the aws-jwt-verify library to perform fine-grained authorization based on specific claims (scopes) and re-verifies the token to adhere to Zero Trust principles, also allowing to retrieve custom claims for enriched authorization process.
Pros:
- Optimal Security Posture: Achieves the "Resource Protection" of Option 1 (blocking DDoS/junk traffic at the edge) and the "Zero Trust" security of Option 2 (backend protects itself if the perimeter is breached).
- Rich Business Logic: The backend receives a pre-validated token but still decodes it to extract claims, enabling complex authorization logic.
- Fail-Safe: If the API Gateway configuration is accidentally misconfigured (e.g., Authorizer turned off), the backend will still reject unauthenticated requests.
Cons:
- Operational Overhead: Requires managing configuration in two places (Gateway infrastructure and Backend code).
- Latency: Introduces a slight performance penalty as cryptographic signature verification happens twice (once at the Gateway, once at the Backend).
- Redundancy: Some may view the double validation of the signature as computationally inefficient, although it is architecturally sound for high-security environments.
Architecture: M2M Client -> API Gateway (Pre-validates) -> Backend (Re-validates & Authorizes)
This diagram illustrates the hybrid architecture where the API Gateway performs initial token validation, and the backend service re-validates the token and checks for granular permissions. This dual-layer approach enhances security by combining perimeter defense with application-level authorization, meaning that even if an attacker manages to bypass the API Gateway, the backend service remains protected through its own validation mechanisms. This approach embodies the principle of Defense in Depth (Multiple layers of security controls) security strategy.
DECISION
After evaluating the options, we have decided to implement the Option 1: Perimeter Validation (API Gateway Offloading) for enabling Cognito authentication in the Targeted API. This decision follows the simplest path to quickly integrate Cognito authentication while minimizing changes to the existing backend codebase. By leveraging the API Gateway's built-in Cognito Authorizer, the Targeted API will offload the authentication responsibility to the API Gateway. Also this option aligns with standardizing authentication mechanisms across Adgem services, ensuring that the development team apply consistent security practices across all services, reducing the risk of implementation errors by individuals.
This approach allows us to rapidly implement secure authentication with minimal development effort, while providing a clear path for future enhancements. Once the initial integration is successful, we can explore the possibility of adopting the hybrid architecture (Option 3) to further enhance security by adding fine-grained authorization checks at the backend level.
Implementation Considerations
-
Infrastructure Configuration: The API Gateway Cognito Authorizer will be configured to validate tokens against the centralized Cognito User Pool dedicated to M2M authentication. This requires setting up the correct User Pool ID, App Client ID, and token validation settings in the API Gateway.
-
Backward Compatibility Strategy: Since the Targeted API is not currently in use by publishers, we can implement this change without a gradual migration path. The existing Sanctum authentication mechanism will be removed entirely, simplifying the codebase and eliminating dual authentication maintenance.
-
Network Security Posture: The Targeted API will be reconfigured to only accept traffic routed through the API Gateway, eliminating direct public access. This will be enforced through security groups and network ACLs to prevent perimeter bypass attempts.
-
Token Validation Configuration: The API Gateway will validate token signatures and expiration times claims to ensure comprehensive token validation before forwarding requests to the backend.
-
Future Migration Path: This initial implementation establishes the foundation for potential migration to Option 3 (Hybrid Architecture) if business requirements evolve to require fine-grained, claim-based authorization at the application level. The transition would involve adding the
aws-jwt-verifylibrary to the backend without requiring changes to the API Gateway configuration.
CONSEQUENCES
Cost Implications
Any of these options will incur costs associated with Cognito token issuance, however usage from API Gateway will incur additional costs. The following table summarizes the pricing implications of API Gateway and Token Issuance from Cognito:
Token Issuance Costs
| Tier | Number of token requests per month | Price |
|---|---|---|
| Tier 1 | 1 - 250,000 | $2.250 per 1000 token requests |
| Tier 2 | 250,001 - 5,000,000 | $1.500 per 1000 token requests |
| Tier 3 | 5,000,001 - And above | $1.125 per 1000 token requests |
API Gateway Costs
| Number of Requests (per month) | Price (per million) |
|---|---|
| First 333 million | $3.50 |
| Next 667 million | $2.80 |
| Next 19 billion | $2.38 |
| Over 20 billion | $1.51 |
*Streaming enabled REST APIs are metered in 10MB response payload increments.
Cost-Benefit Analysis: While there are additional infrastructure costs, the trade-off includes reduced development and maintenance costs from eliminating custom authentication code, decreased security incident risk, and improved operational efficiency through centralized authentication management. Token caching strategies (discussed in Risks section) can significantly reduce token issuance costs.
Performance and Latency
Network Latency
Offloading authentication to the API Gateway introduces an additional network hop and token validation processing time at the Gateway layer. However, this latency is generally negligible and is offset by the following factors:
- API Gateway error rates (4xx, 5xx responses)
- Latency metrics (p50, p95, p99)
- Throttled request counts
- Integration timeouts
Security Posture Changes
Perimeter Security Enhancement
Relying on perimetral security (Option 1) requires ensuring that the Targeted API is no longer accessible directly. This must be achieved by:
- Removing public internet exposure from the Targeted API
- Configuring VPC security groups to only accept traffic from the API Gateway
- Implementing network ACLs as an additional layer of defense
- Regularly auditing network configurations to prevent accidental exposure
Important: This creates a hard dependency on proper network configuration. Misconfiguration could either expose the service or prevent legitimate access.
Reduced Attack Surface
By removing custom authentication code from the application layer and relying on AWS-managed services, the attack surface is reduced. AWS continuously patches and updates Cognito and API Gateway security features, reducing the burden on the Adgem team to maintain security patches for authentication libraries.
Migration and Implementation Impact
Zero Impact Migration Window
Currently, the TA is not used yet by any publisher, this provides a unique opportunity and flexibility to implement this change without impacting existing clients. There is no need for:
- Gradual rollout strategies
- Dual authentication support during transition
- Client communication and coordination
- Backward compatibility testing with legacy tokens
Authentication Mechanism Replacement
This change requires removing the existing authentication mechanism (Sanctum) in the Targeted API and replacing it with Cognito-based authentication. This simplifies the codebase by:
- Eliminating Sanctum dependencies and related database tables
- Removing token generation and validation logic from the application
- Reducing the overall codebase complexity and maintenance burden
Development and Operational Impact
-
Development Team Learning Curve: The development team will need to understand the API Gateway Cognito Authorizer configuration and troubleshooting procedures. However, once configured, developers working on the Targeted API backend will have less authentication code to maintain, allowing them to focus on business logic rather than security infrastructure.
-
Standardization and Reusability: This implementation creates a reusable pattern that can be applied to other Adgem services. Future services can leverage the same Cognito User Pool and API Gateway configuration approach, accelerating their authentication implementation and ensuring consistent security practices across the ecosystem. This aligns with the broader Adgem goal of centralizing authentication across all services.
-
Simplified Token Management: By removing the need for the Targeted API to manage and store tokens for other services (like the Offer API), the codebase becomes simpler and more maintainable. Token lifecycle management (generation, rotation, revocation, expiration) is delegated to Cognito, which provides built-in features for these operations.
Client Integration Requirements
Client Implementation Changes
Although the Targeted API has no current publishers, future clients will need to implement the OAuth 2.0 Client Credentials flow. This requires:
- Providing clear documentation and SDK examples for token acquisition
- Implementing token refresh logic (tokens expire after maximum 24 hours)
- Adopting token caching strategies to minimize token requests
- Handling token expiration and automatic retry logic
Developer Experience Consideration: This is more complex than the previous long-lived token approach but aligns with industry standards and modern security best practices. Providing client libraries or SDK wrappers can significantly improve the developer experience for publishers.
Risks
1. Short-Lived Token Management Complexity
The Targeted API currently uses long-lived tokens, because of this the publishers did not need to frequently request new tokens. However, Cognito issues tokens with a maximum TTL of 24 hours. This means that clients need to implement token refresh logic to obtain new tokens before expiration.
Mitigation Strategies:
-
Token Caching Implementation: Cognito provides token caching capabilities. Caching tokens reduces the number of requests made to Cognito, improving performance and reducing latency for clients, while also reducing costs since Cognito charges based on the number of issued tokens.
-
Client SDK Development: Providing wrapper libraries or SDK helpers that abstract token management complexity from publishers, automatically handling token acquisition, caching, and refresh logic.
-
Clear Documentation: Providing comprehensive documentation with code examples showing best practices for token lifecycle management across different programming languages commonly used by publishers.
Cache Cost Implications: Although caching tokens in the API Gateway reduces latency and costs by minimizing requests to Cognito, there is still some cost associated with token caching based on the size and duration of the cache. However, this cost is generally lower compared to the cost of issuing new tokens from Cognito for every request.
| Cache Memory Size (GB) | Price per Hour |
|---|---|
| 0.5 | $0.02 |
| 1.6 | $0.038 |
| 6.1 | $0.20 |
| 13.5 | $0.25 |
| 28.4 | $0.50 |
| 58.2 | $1.00 |
| 118.0 | $1.90 |
| 237.0 | $3.80 |
2. Token Leakage and Unauthorized Access
If tokens are not properly secured in transit and at rest, they could be intercepted by malicious actors. Since tokens can grant access to multiple protected resources, their leakage can lead to unauthorized access and potential data breaches. Unlike long-lived API keys, OAuth tokens have broader scope and can be replayed until expiration.
Mitigation Strategies:
- Token Revocation: Cognito allows revoking existing tokens through the RevokeToken API. So if a token is suspected to be compromised, it can be revoked to prevent further unauthorized use.
- Scope Limitation: Configure Cognito App Clients with minimum required scopes. Each service should only have access to the resources it needs to function (principle of least privilege).
Impact: High. Token compromise could grant attackers access to multiple Adgem services, but short TTL and revocation capabilities limit the window of exposure.
3. API Gateway Single Point of Failure
Risk Description: By centralizing authentication through API Gateway, it becomes a critical dependency. If the API Gateway experiences outages, becomes misconfigured, or faces performance degradation, all services relying on it for authentication will be affected, potentially causing cascading failures across the Adgem ecosystem.
Mitigation Strategies:
- High Availability Configuration: Deploy API Gateway across multiple availability zones (AWS default behavior).
- Rate Limiting and Throttling: Configure appropriate throttling limits to prevent overwhelming the Gateway during traffic spikes or DDoS attacks.
- Health Monitoring and Alerting: Establish comprehensive monitoring for:
- API Gateway error rates (4xx, 5xx responses)
- Latency metrics (p50, p95, p99)
- Throttled request counts
- Integration timeouts
4. Perimeter Bypass Risk
Risk Description: Option 1 relies entirely on perimeter security (API Gateway validation). If the Targeted API remains accessible through alternative network paths (misconfigured security groups, VPC peering rules, or internal network access), attackers who bypass the API Gateway would reach an unprotected backend that performs no authentication validation.
Mitigation Strategies:
-
Network Isolation: Completely remove public internet access to the Targeted API. Configure it to run in private subnets with no internet gateway access.
-
Security Group Hardening: Configure security groups to only accept traffic from the API Gateway VPC Link or specific API Gateway IP ranges. Regularly audit security group rules using AWS Config or similar tools.
Impact: High if exploited, but low probability with proper network configuration and regular audits. Migration to hybrid architecture (Option 3) would eliminate this risk entirely.
5. Vendor Lock-in and Migration Complexity
Risk Description: Deep integration with AWS Cognito and API Gateway creates strong coupling to the AWS ecosystem. Future migration to alternative authentication providers (Auth0, Okta, on-premise solutions) or different cloud providers would require significant refactoring of authentication infrastructure and client integration code.
Mitigation Strategies:
- Standards Compliance: Cognito follows OAuth 2.0 and OpenID Connect standards. Ensure all client implementations use standard OAuth flows rather than AWS-specific features, making future migrations easier.
Impact: Medium. Lock-in is a long-term concern but is partially mitigated by standards compliance. The operational benefits and AWS ecosystem alignment currently outweigh portability concerns.
NOTES
This change is part of a broader initiative to centralize authentication across all Adgem services using Amazon Cognito. Future ADRs will address similar integrations for other services, building upon the patterns and lessons learned from this implementation in the Targeted API. One important aspect to consider is that this change will affect how the Offerwall interacts with the Targeted API, the Player Experience team will require to work along with us to ensure a smooth transition between the actual and new authentication mechanisms.
References
- aws-jwt-verify library
- Client credentials flow
- Cognito ADR
- API Gateway Cognito Authorizer
- Defense in Depth
- Token caching
- Issued tokens pricing
- Zero Trust
- API Gateway Pricing
- PR #89: docs: Enable Cognito authentication between the Targeted and Offer APIs
- PR #127: docs: backfill PR reference links for existing ADRs
- PR #137: feat: add cross-links and update all site URLs to custom domains
Original Author
Daniel Ballesteros