0041: Using BigDataCloud to backfill missing state values in postbacks
STATUS
Accepted
CONTEXT
For determining player geolocation data, we use MaxMind. It is a service that returns geolocation data based on IP addresses, such as:
- Country
- State
- City
- Latitude and longitude coordinates
For roughly 10% of calls to MaxMind, "state" and "city" data is not correctly returned, being retrieved as null values. For the Capital One publisher, we do need "state" values to be sent back in outgoing postbacks, which will fail otherwise.
We need a fallback service that will determine "state" values in case it's not correctly retrieved by MaxMind.
We have used BigDataCloud's free client-side reverse geocoding to city API as a fallback for a while (behind a feature flag for Capital One), however we have been using it incorrectly since we're not supposed to call this API from the server (see this article), which caused our AWS IP address to be banned. We then proceeded to unban the IP by speaking with their support team. Now, we need an alternative.
Considered Options
Continue to send null state values if MaxMind fails to retrieve that by IP (status quo)
This will result in failed outgoing postbacks for Capital One.
Use BigDataCloud's free client-side reverse geocoding to city API as a fallback using latitude/longitude (status quo)
This resulted in our AWS IP address being banned since we were calling this API from the server side.
Use BigDataCloud's server-side reverse geocoding to city API as a fallback using latitude/longitude
- This is free as long as we keep the calls under 50K / month.
- BigDataCloud's Website and their Terms and Conditions page mentions 99.999% SLA uptime. However, this only applies for paid subscriptions.
- BigDataCloud's Website mention sub-millisecond API response times.
Use alternative(s) to BigDataCloud with geocoder/php package
This package can be used as an abstraction to many geocoding providers, which would help switch providers easily and implement redundancy logic so when one provider fails, other can be used in its place. There are many supported providers. Here are a few selected popular options:
- HERE
- 99.9% SLA
- 30K free requests / month
- Pay as you go: price per request after 30K starts at $0.86
- Google Maps
- 99.9% SLA
- 10K free requests / month
- Price can vary since we have a "pay as you go" or monthly subscription (starting at $100 / month) options available
- Mapbox
- 99.9% SLA
- 100K free requests / month
- Pay as you go: price per request after 100K requests starts at $0.75
- Azure Maps
- 99.9% SLA
- 5K free requests / month
- Pay as you go: price per 1000 requests after 5K requests is $4.50
More Comparisons
| Provider | Uptime SLA | Latency (P50/P95) | Cost @ 7K/mo | Free Tier | Rate Limits | Integration Effort | Global Coverage |
|---|---|---|---|---|---|---|---|
| BigDataCloud | 99.999% (paid-only) | Sub millisecond (?) | FREE | 50K/mo | Unlimited | Easy | Very Good |
| HERE | 99.9% | Unknown | FREE | 30K/mo | Unknown | Easier | Very Good |
| Google Maps | 99.9% | Unknown | FREE | 10K/mo | 3000 QPM | Easier | Excellent |
| Mapbox | 99.9% | Unknown | FREE | 100K/mo | 1000 QPM | Easier | Very Good |
| Azure Maps | 99.9% | Unknown | $9 | 5K/mo | 50 QPS | Easier | Excellent |
Notes:
- All the providers listed above have global coverage
- This ADR is already discussing a fallback to MaxMind as a established provider in the application. However, multiple additional fallback providers could be adopted in a chain to make it even safer to retrieve state values.
DECISION
Phase I: Update Current BigDataCloud Implementation; Add Monitoring
For publishers that require "state" values to be sent in outgoing postbacks (currently only Capital One), we lean on using BigDataCloud's server-side reverse geocoding to city API using latitude/longitude coordinates as a fallback to MaxMind. We have used the free client-side API before, however since we require to call this inside the server, it caused our AWS IP address to be banned as per BigDataCloud's rules. So we need to migrate to their server-side API to continue using that service.
This solution will require the least amount of work and will be fairly simple to implement, since we'll only need to update the URL of the endpoint and include an API key to the call. This is the most fast solution to keep Capital One postbacks from failing.
Also, we will add a DataDog metric to monitor the number of calls and get alerted if reaching 75% of free tier allowance.
Phase II: Use Geocoder Library with BigDataCloud + Google Maps; Optimizations
As a "v2" solution, we can implement BigDataCloud as a custom Geocoder provider, and use a chaining logic so if BigDataCloud fails, Google Maps is used.
use Geocoder\Query\ReverseQuery;
$geocoder = new \Geocoder\ProviderAggregator();
$client = new \GuzzleHttp\Client();
$chain = new \Geocoder\Provider\Chain\Chain([
new \Our\Custom\Provider\BigDataCloud($client),
new \Geocoder\Provider\GoogleMaps\GoogleMaps($client),
// ...
]);
$geocoder->registerProvider($chain);
$result = $geocoder->reverseQuery(ReverseQuery::fromCoordinates(...));
We can also upgrade our caching logic to use a geohashing solution, which will help reduce unnecessary calls to external services, improving performance and possibly reducing costs.
Optional Phase III: More Providers to Improve Redundancy
We'll keep monitoring our solution and will chain more providers to improve redundancy. This can be achieved by updating the code above to include additional providers on the $chain variable.
CONSEQUENCES
API Key Setup
- We will need to setup an account for BigDataCloud's service, which will enable us to generate an API key
- We will need to update the infrastructure to include a new environment variable for the API key
- We will need to refactor the code to start using the correct endpoint and provide the API key in the calls
Pricing
- We may need to configure billing for the new service, as per pricing table below. If the monthly limit is reached, an optional pay-as-you-go feature can be enabled for overage queries. Also, a 17% discount can be applied if yearly subscription is chosen.
| Tier | Max. Queries per Month | Pricing (monthly) | Pricing (yearly) | Optional Pay-as-you-go Feature |
|---|---|---|---|---|
| Base | 50K | FREE | FREE | Not Available |
| Bronze | 500K | $19.95 / month | $199.5 / year | $0.3990 per 10,000 overage calls |
| Silver | 5M | $169 / month | $1,690 / year | $0.3380 per 10,000 overage calls |
| Gold | 50M | $1,399 / month | $13,990 / year | $0.2798 per 10,000 overage calls |
| Platinum | UNLIMITED | $4,499 / month | $44,990 / year | Not Applicable |
- We do expect to pay at least the Bronze tier eventually, however we could start using the free Base tier and monitor how many calls we make to get alerted if we are reaching the quota.
- Because we expect roughly 10% of calls to MaxMind to fail, that would mean approximately 7K clicks/month out of 70K/month on average for Capital One
Note: this takes into account BigDataCloud provider, see sections above for pricing for other providers.
Optimizations
- We will need to refactor the code in order to only make this call when necessary, which is when saving the state/city in the click data since this will be the source of those values when building outgoing postbacks, minimizing costs and performance overhead.
- We may need to update our caching logic in order to not call this API too many times unnecessarily; our current solution saves the result for the same lat/lon coordinates for 24 hours
Monitoring
- We should start monitoring the amount of missing "state" values from MaxMind and also the number of calls to BigDataCloud so we can anticipate costs. That means creating a metric of how many times we're using this service and then set up monitors / alerts on DataDog so we know if we reach a certain quota depending on the provider we choose.
Risks
- Unexpected costs and performance overhead
- Adding more complexity to codebase
- Expansion to other publishers, causing more API calls / latency
- BigDataCloud could also fail to deliver the necessary data, which would prompt our business team to manually deal with this problem
- BigDataCloud's website mentions a 99.999% SLA uptime (for paid subscriptions)
- If this solution fails to retrieve "state" values for geolocation data, Capital One will reject the postback, and someone on our side will need to manually work with Capital One to make those conversions actually complete.
NOTES
Further Steps
- In the future, we should determine if and how to backfill this in our datawarehouse tables.
References
- BigDataCloud's Free Client Side Reverse Geocoding to City API
- BigDataCloud's Server Side Reverse Geocoding to City API
- BigDataCloud's Free Client-side Reverse Geocoding API: Usage Guidelines and Fair Use Policy
- BigDataCloud's Daily Report on IP Geolocation Accuracy - Global
- BigDataCloud's Terms and Conditions
- HERE's SLA
- HERE's Pricing
- Google Maps's SLA
- Google Maps's Pricing
- Mapbox's SLA
- Mapbox's Pricing
- Azure Maps's SLA
- Azure Maps's Pricing
- Geocoder PHP library
- PR #94: docs: Using BigDataCloud to backfill missing state values in postbacks
- PR #127: docs: backfill PR reference links for existing ADRs
Original Author
Luiz Bueno