I tried returning 402 Payment Required to a massive bot with 15 million cumulative requests using AWS WAF Monetize
This page has been translated by machine translation. View original
Introduction
In a previous article, I verified the new AWS WAF feature Monetize (402 Payment Required) in a test environment. Monetize is a feature that returns HTTP 402 for requests matching specified conditions and presents payment information (x402 price manifest).
This article documents applying this feature to a problem that was actually occurring in a production environment. The target environment is dev.classmethod.jp (CloudFront + AWS WAF).
Problem Discovery: Identifying UA-Spoofing Bots
Abnormal Increase in GA4 Direct
At the end of May, I noticed that the Direct channel in GA4 was spiking. After analyzing CloudFront access logs, I confirmed that a large volume of access was coming from 3 IPs within a specific /24 range.
Observed Technical Characteristics
From WAF Sampled Requests and access logs, I observed the following facts.
- UA: Fixed at Chrome/114.0.0.0. Chrome/114 was released in June 2023, making it a version over 2 years old. It is unlikely that a current browser would remain at this version without automatic updates, so I determined it was highly likely to be a fixed UA string
- accept-language:
enonly (no quality value). Normal browsers specify values likeen-US,en;q=0.9, making this a simplified header configuration - JS asset retrieval present: Retrieval of
_next/static/chunks/*.jswas confirmed. I judged this as evidence suggesting the possibility of full-browser type access with a JavaScript execution environment - JA3/JA4 fingerprint: Identical values across all 3 IPs. Estimated to be operating with the same TLS implementation and configuration
- HTTP/2.0 + HTTP/3.0 mixed: Automatic protocol negotiation
In ARIN whois, this IP range was assigned to a well-known SaaS company. The UA being used is a Chrome UA and does not contain any strings explicitly identifying it as a Bot.
Quantitative Evidence
Results of aggregating access from the target IP range by day and IP.
| Date | IP-A | IP-B | IP-C | Total |
|---|---|---|---|---|
| 06/09 | 290,562 | 290,702 | 291,517 | 872,781 |
| 06/10 | 289,268 | 293,359 | 290,464 | 873,091 |
| 06/11 | — | — | — | (※) |
| 06/12 | 290,553 | 292,475 | 292,013 | 875,041 |
| 06/13 | 292,203 | 289,851 | 291,722 | 873,776 |
| 06/14 | 287,925 | 292,189 | 294,076 | 874,190 |
| 06/15 | 291,085 | 291,778 | 291,746 | 874,609 |
※ Log data could not be retrieved for 6/11.
- Daily total volume is stable at approximately 870,000. The day-to-day variance is below 0.3%, leading me to determine this is highly likely to be automated access with a fixed rate
- Distribution across 3 IPs is precise. The difference between IPs on each day is generally within 2%, suggesting some form of even distribution (load balancer, IP rotation, etc.) is being performed at the sender's end
- A small amount of access (4 total) from Chrome/148.0.7778.97 (Linux) within the same IP range was also observed, but its relationship to the primary UA is unknown
18-Day Hold
18 days passed from detection on 5/29 to the countermeasure implemented on 6/16. While options such as Block were available, the response was put on hold because CloudFront's cache hit rate was high with minimal impact on origin load, and no notable deterioration in bandwidth or response time was observed.
The trigger for taking action was the release of a new option (Monetize feature) with smaller side effects.
Why 402 Instead of Block or Challenge
Based on the observed facts in the previous section (JS asset retrieval, fixed UA, identical JA3/JA4 across 3 IPs), I determined there was a high likelihood of automated full-browser type access. With this judgment as the premise, I considered the response policy.
Challenge ruled out: If it is automated full-browser type access, there is a possibility of mechanically passing a JS Challenge. This is unverified, and if bypassed, it would not serve as a countermeasure.
Block ruled out: Given that the IP address holder is a well-known SaaS company, I considered the following scenarios.
- The possibility that the holder's own service is crawling due to a misconfiguration or as a specification
- Wanting to avoid Block until contacting and confirming with the holder
402 misidentification risk: Unlike Challenge, 402 is an option with greater impact in cases of misidentification.
| Challenge misidentification | 402 misidentification | |
|---|---|---|
| In case of human browser | Transparently passes through JS execution in most normal browsers. Impact is limited | Effectively a Block for humans. There are almost no general users who can pay with cryptocurrency |
In this case, the judgment uses an AND condition of two criteria: "IP belonging to a specific /24 range" AND "that individual IP exceeding 100 requests in a 60-second evaluation window." I determined that the probability of a normal human browser simultaneously satisfying both conditions is extremely low.
When applying 402 to traffic with the possibility of human use, it is recommended to apply it after sufficiently narrowing down with an AND judgment of multiple conditions.
For the above reasons, I chose 402, and by combining it with rate-based rules, I achieve the control of "allow normal access volume, return 402 to sources determined to have exceeded the threshold."
Implementation
IP Set Creation
The target /24 (216.198.x.x/24) was registered in an IP Set.
The reason for adopting /24 is that 3 IPs within the same range are being used evenly, to also handle any additional IPs in the future. The side effect of /24 (expanding the target range) is suppressed by narrowing down to "sources within the same /24 that have made a large number of accesses in a short time" through the AND condition with rate-based overflow. However, it is accepted that other hosts within the range could become subject to the rule if they exceed the rate.
RateBased Rule
| Setting Item | Value |
|---|---|
| Evaluation window | 60 seconds |
| Threshold | 100 |
| Aggregation key | IP (evaluated per individual IP) |
| ScopeDown | IP Set above |
| Action | Count |
| Label | custom:rate-exceeded:headless-bot |
The reason for setting the action to Count is that the Monetize action cannot be specified with a RateBased rule alone. The configuration is set so that Count does not terminate the request but attaches a label, and the subsequent Monetize rule references that label to trigger the 402.
Monetize Rule
| Setting Item | Value |
|---|---|
| Condition | Label custom:rate-exceeded:headless-bot match AND IP Set match |
| Action | Monetize |
| PriceMultiplier | 10 |
The reason IP Set is included again in the condition is as a safety measure. Although the RateBased ScopeDown already narrows it down to the IP Set, this is to prevent unintended Monetize triggering if the label name is reused in other rules in the future.
Unit price calculation:
MonetizationConfig Prices[].Amount: 0.001 USDC (Base Price)
× PriceMultiplier: 10
= 0.01 USDC/request
MonetizationConfig
{
"CryptoConfig": {
"PaymentNetworks": [
{
"Chain": "BASE_SEPOLIA",
"WalletAddress": "0x2f0cb3deddd256f4c889...xxxxxxxxxxxx",
"Prices": [{"Amount": "0.001", "Currency": "USDC"}]
}
]
},
"CurrencyMode": "TEST"
}
This application uses TEST mode / Base Sepolia testnet. There is no actual billing or actual revenue.
Direct CLI Operation
Since CloudFormation is not supported, I operated directly with the AWS CLI. The procedure is to obtain the LockToken with get-web-acl, then pass that token to update-web-acl to add the rule.
aws wafv2 get-web-acl \
--name devio2024-waf-bot-webacl \
--scope CLOUDFRONT \
--id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
--region us-east-1
Pass the LockToken included in the output to the next command. In the actual update-web-acl, it is necessary to pass not only Rules but also the existing DefaultAction, VisibilityConfig, etc. in their preserved state (the following is a simplified example).
aws wafv2 update-web-acl \
--name devio2024-waf-bot-webacl \
--scope CLOUDFRONT \
--id xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
--lock-token <obtained LockToken> \
--rules file://rules.json \
--region us-east-1
LockToken is an optimistic lock. If another update occurs after obtaining it with get-web-acl, update-web-acl will fail.
Operation Verification
I confirmed in WAF logs that the system is operating as designed. The following is a log record at the time of rate exceedance.
{
"timestamp": 1781595891917,
"terminatingRuleId": "Monetize-HeadlessBot",
"terminatingRuleType": "REGULAR",
"action": "MONETIZE",
"rateBasedRuleList": [
{
"rateBasedRuleName": "RateLabel-HeadlessBot",
"limitKey": "IP",
"maxRateAllowed": 100,
"evaluationWindowSec": 60
}
],
"nonTerminatingMatchingRules": [
{
"ruleId": "RateLabel-HeadlessBot",
"action": "COUNT"
}
],
"responseCodeSent": 402,
"httpRequest": {
"clientIp": "216.198.x.x",
"country": "US",
"uri": "/articles/securityhub-amazon-managed-grafana",
"httpVersion": "HTTP/2.0",
"httpMethod": "GET",
"headers": [
{"name": "user-agent", "value": "Mozilla/5.0 (Macintosh; ...) Chrome/114.0.0.0 Safari/537.36"},
{"name": "accept-language", "value": "en"}
]
},
"labels": [
{"name": "awswaf:xxxxxxxxxxxx:webacl:devio2024-waf-bot-webacl:custom:rate-exceeded:headless-bot"}
],
"ja3Fingerprint": "885a2f978c1b08c89c8baba21a1625b5",
"ja4Fingerprint": "t13d1516h2_8daaf6152771_d8a2da3f94cd"
}
From this log, the two-stage operation can be confirmed.
- The RateBased rule (Count) is recorded in
nonTerminatingMatchingRules→ Evidence that the request was not terminated but had a label attached and was passed to subsequent processing terminatingRuleIdis the Monetize rule → The Monetize rule ultimately terminated and returned 402- The rate exceedance label is attached in
labels→ The label linkage from RateBased to Monetize is functioning
For all 3 IPs in the target IP range, 402 was triggered upon rate exceedance. On the other hand, I confirmed that access from the same source within the threshold (100 requests or fewer per 60 seconds) is permitted normally with action: ALLOW, and that switching occurs according to rate status.
For details on the x402 price manifest (payment information included in the 402 response body), please refer to the previous article.
Operational Notes and Future Plans
402 does not guarantee request stoppage. Unless the bot side changes its behavior, requests will continue to reach CloudFront/WAF.
Future plans are as follows.
- If payment is confirmed: Change CurrencyMode to
REALand migrate to Base mainnet. Proceed with wallet (Base-compatible) setup for receiving payments - If neither payment nor stoppage occurs: Migrate to Block target as a request source showing no prospect of improvement. 402 is a temporary grace measure
- IaC: The current state is a manual change outside CloudFormation management, causing drift. Planning to migrate to IaC after CFn support is added
Effectiveness measurement (changes in access volume and impact on GA4) is outside the scope of this article. I plan to conduct a separate verification after a sufficient period has passed since application.
Summary
In response to large-scale crawling by automated access suspected of UA spoofing, I implemented a response using AWS WAF's Monetize feature to return 402 Payment Required. 402 is a signal of "transaction" rather than "hostility," leaving options open for the other party. Which of Block / Challenge / 402 to choose is a judgment based on the nature of the accessing entity and the risk of misidentification. Technically, a two-stage configuration of RateBased (Count) + Monetize achieves "allow normal volume, charge for excess." The configuration allows for staged migration to REAL after confirming operation in TEST mode.
