I researched how to introduce rate limiting to a site proxied through Cloudflare
This page has been translated by machine translation. View original
Introduction
A need arose for IP-based rate limiting on a specific API endpoint (one that calls a high-cost external service per request). As the implementation location, I thought it would be best to stop requests at the edge (Cloudflare), and started my investigation from there.
The first thing I considered was Cloudflare's WAF Rate Limiting Rules. However, the Pro plan I was using ($20/month) limits Rate Limiting Rules to just 2, and those were already filled with existing rules. Increasing the quota would require upgrading to the Business plan ($200/month), but raising costs to 10x just for 1 additional rule wasn't realistic.
So I decided to explore alternative approaches to implement rate limiting while keeping costs down.
Since the target domain was already proxied through Cloudflare, using Workers Routes would allow attaching a Worker per route pattern like example.com/api/example*, making only the target endpoint go through the Worker. Not needing to route the entire site through a Worker and being able to limit the scope of impact seemed convenient.
Cloudflare offers multiple rate limiting options available from Workers:
- Workers Rate Limiting Binding (official built-in)
- Durable Object (DO)
- 2a. Count with in-memory state
- 2b. Persist with SQLite-backed storage
I tried each in order, but each had caveats. I ultimately re-implemented on the backend side (application + RDB), but I'll organize the knowledge gained along the way.
Requirements
- Target the specific endpoint
/api/example?mode=heavy - Allow up to 10 requests per IP within 60 seconds
- Return HTTP 429 when exceeded
Pattern 1. Workers Rate Limiting Binding
This is Cloudflare's official built-in feature, available simply by writing configuration in wrangler.toml.
Implementation
# wrangler.toml
[[ratelimits]]
name = "MY_LIMITER"
namespace_id = "1001"
[ratelimits.simple]
limit = 10
period = 60 # Only 10 or 60 can be specified
// src/index.ts
export default {
async fetch(req, env) {
const key = req.headers.get("cf-connecting-ip") ?? "unknown";
const { success } = await env.MY_LIMITER.limit({ key });
if (!success) {
return new Response("Too Many Requests", { status: 429 });
}
return fetch(req);
},
};
The implementation only takes a few lines, making it extremely simple.
Verification Results
After deploying to a test environment and sending 100 req / 10 seconds (several dozen times the configured value) from a single IP, a large discrepancy was observed between expected and actual values.
- Expected: 10 successes, 90 with
success: false - Actual: Only a few
success: falseout of 100 requests
Against the configured value of 10 req / 60s, the effective behavior allowed several dozen to ~100 req/min through.
Cause
The official documentation (Workers Rate Limiting - Accuracy) states the following:
The Rate Limiting API is permissive, eventually consistent, and intentionally designed to not be used as an accurate accounting system.
...the isolate that serves each request will check against its locally cached value of the rate limit. Very quickly, but not immediately...Source: https://developers.cloudflare.com/workers/runtime-apis/bindings/rate-limit/#accuracy
In summary:
- Each isolate maintains a counter in its local cache
- Synchronization between isolates has a delay of several seconds
- When multiple isolates exist within the same PoP, requests are distributed, causing each isolate's local count to be spread out
- Cloudflare itself explicitly states it "should not be used as an accurate accounting system"
This made it clear that it doesn't match use cases requiring accuracy matching the configured values.
Cost
There is no additional charge for Rate Limiting Binding itself. You are only charged for Workers request count and CPU time.
| Plan | Requests | CPU Time |
|---|---|---|
| Workers Free | 100,000 / day | 10ms / invocation |
| Workers Paid ($5/month) | 10M / month included, excess $0.30 / 1M req | 30s / invocation, additional charges apply |
Considerations
- Workers Rate Limiting Binding is not suited for strict abuse prevention
- It functions well enough as a loose filter, but for use cases requiring control matching the configured values, a different approach is needed
Pattern 2. Durable Object (in-memory)
Durable Objects (DO) operate as "logically single instances," making it possible to aggregate and count requests from multiple PoPs. This pattern maintains counters in in-memory state.
Note that to use Durable Objects on the Workers Free plan, the DO class must be declared as new_sqlite_classes (the legacy Key-Value-backed option requires the Workers Paid plan). In this pattern, the DO class is declared as SQLite-backed, but the code operates using only in-memory state without calling ctx.storage.
Implementation
In wrangler.toml, declare it as SQLite-backed.
[[durable_objects.bindings]]
name = "RATE_LIMITER"
class_name = "RateLimiter"
[[migrations]]
tag = "v1"
new_sqlite_classes = ["RateLimiter"] # Required for Free plan compatibility
The design creates a DO per IP (idFromName(ip)). The counter is held in-memory as an instance variable of the class.
// rate_limiter.ts
export class RateLimiter extends DurableObject {
private timestamps: number[] = [];
async checkLimit() {
const now = Date.now();
this.timestamps = this.timestamps.filter(t => now - t < 60_000);
if (this.timestamps.length >= 10) return { success: false };
this.timestamps.push(now);
return { success: true };
}
}
// index.ts (Worker main)
const id = env.RATE_LIMITER.idFromName(ip);
const stub = env.RATE_LIMITER.get(id);
const { success } = await stub.checkLimit();
Since the storage API is not called, no SQLite rows read/written charges are incurred.
Verification 1. Burst Behavior
When sending 11 requests consecutively, the 11th request returned 429 as expected.
1–10: HTTP 200
11–15: HTTP 429
Verification 2. With Idle Period
Requests were sent under the condition of 5 requests → 15 seconds idle → 6 requests. If a normal sliding window were functioning, the 5 timestamps from Phase 1 would remain in the window since they are within 60 seconds, and the 6th request in Phase 2 (11th total) should return 429.
The actual results were as follows:
Phase 1: all 5 requests returned 200
15 seconds idle
Phase 2: all 6 requests returned 200
This shows that the Phase 1 count was lost. In other words, inserting an idle period allowed the rate limit to be bypassed.
Cause
The DO lifecycle is responsible. The official DO Lifecycle documentation states:
after 10 seconds of inactivity while in this state.
When hibernated, the in-memory state is discardedSource: https://developers.cloudflare.com/durable-objects/concepts/durable-object-lifecycle/
Under certain conditions, a DO hibernates after 10 seconds of idle and the in-memory state is discarded. Upon the next access, it restarts from constructor(), resetting the counter to its initial state.
From an attacker's perspective, the following bypass is therefore possible:
- Send bursts of 9 requests with intervals of about 10–15 seconds
- Since the counter resets each time, approximately 36–54 requests per minute pass through (4–5x the configured value)
Since this bypass is viable, it is not practical as an abuse prevention measure.
Considerations
- DO in-memory state must be designed with the assumption that it will be lost
- Short bursts can be prevented, but sustained access with intervals cannot be handled
Pattern 3. Durable Object (SQLite-backed persistent)
Switching to a policy of persisting timestamps using the DO Storage API. Since it has already been created as SQLite-backed, using the KV API (ctx.storage.get/put) will write to SQLite internally.
Implementation
export class RateLimiter extends DurableObject {
async checkLimit() {
const now = Date.now();
const stored = (await this.ctx.storage.get<number[]>("ts")) ?? [];
const timestamps = stored.filter(t => now - t < 60_000);
if (timestamps.length >= 10) {
return { success: false }; // Don't write on block (suppress write charges)
}
timestamps.push(now);
await this.ctx.storage.put("ts", timestamps);
return { success: true };
}
}
Since DOs operate single-threaded (input gates), there is no possibility of another request interrupting between get → compute → put, ensuring atomicity.
Verification Results
All verification patterns produced the expected results.
| Verification | Result |
|---|---|
| 429 from 11th burst request | OK |
| Counter maintained across hibernate after 15 seconds idle | OK |
| Counter entries outside the window are excluded | OK |
Additional Challenge
DO storage has no TTL. Storage for IPs that never revisit remains unless explicitly deleted. Cloudflare DO Storage has no built-in TTL feature, so data written with storage.put does not expire automatically.
Solution
Implement TTL-equivalent behavior using the Alarm API. DO Alarms are a feature that calls a method at a specified time, which can be used to implement TTL-equivalent behavior.
async checkLimit() {
// ... existing logic ...
await this.ctx.storage.put("ts", timestamps);
// Clean up 65 seconds after the last request.
// Each call to setAlarm overwrites with the latest time,
// so it won't fire during continuous access, only after idle.
await this.ctx.storage.setAlarm(now + 60_000 + 5_000);
}
async alarm() {
await this.ctx.storage.deleteAll();
}
This automatically cleans up DO storage for IPs that have had no access for 65 seconds after their last request.
Cost
Durable Objects can be used on both the Workers Free and Paid plans, but to use them on Workers Free they must be created as SQLite-backed (new_sqlite_classes) (the legacy Key-Value-backed option requires Paid).
| Billable Item | Workers Free | Workers Paid |
|---|---|---|
| DO requests | 100,000 / day | 1M / month included, excess $0.15 / 1M req |
| Duration (execution time × 128MB) | 13,000 GB-s / day | 400,000 GB-s / month included, excess $12.50 / 1M GB-s |
| SQLite rows read | 5M / day | 25B / month included, excess $0.001 / 1M rows |
| SQLite rows written | 100,000 / day | 50M / month included, excess $1.00 / 1M rows |
| SQLite storage | 5GB (storage capacity limit) | 5GB-month included, excess $0.20 / GB-month |
The consumption per request in this implementation is as follows (setAlarm internally involves a SQLite write, so it counts as rows written).
| Operation | rows read | rows written |
|---|---|---|
| On success | 1 | 2 (put + setAlarm) |
| On block | 1 | 0 |
Writes are the tightest in terms of quota, and when operating on the Workers Free plan, approximately 50,000 successes/day is the upper limit (100K writes ÷ 2 writes/req). Since the design avoids generating writes on block, it's an advantage that charges don't scale proportionally even under large-scale abuse.
In addition, the Worker's own request count (100K/day Free) is consumed separately from DO, so it's also necessary to check whether the total traffic to the target endpoint exceeds this quota. Whether operation within the Workers Free plan quota is possible must be determined by comparing the traffic volume of the target endpoint against the quotas above.
Additional Considerations When Implementing in a Worker
This is about path normalization. Since I ultimately re-implemented on the backend side, this configuration was not adopted, but I'll leave this note about the path judgment pitfall discovered during testing as something to be aware of when implementing rate limiting in a Worker.
Implementing with the assumption that only /api/example?mode=heavy is the target can result in bypasses.
In backend routing (for example, Rails resources-style routing), the following are all conventionally routed to the same action:
/api/example/api/example.json(format automatically allowed)/api/example/(trailing slash allowed)/api/example%2ejson(percent-encoded.is also decoded and interpreted as.json)
If the Worker's judgment uses a simple comparison like url.pathname === "/api/example", percent-encoded variants like %2ejson pass through, and the rate-limited action ends up being executed on the backend. I confirmed that a bypass was actually achieved in the test environment.
Solution
Combine URLPattern and decodeURIComponent to include all variant paths in the judgment target.
const TARGET_PATTERN = new URLPattern({
pathname: "/api/example{.:format}?{/}?",
});
const isTarget = (url: URL): boolean => {
let pathname = url.pathname;
try {
pathname = decodeURIComponent(url.pathname);
} catch {
// Invalid % encoding is judged using the original pathname
}
return TARGET_PATTERN.test({ pathname }) &&
url.searchParams.getAll("mode").includes("heavy");
};
- Use
URLPatternto declaratively express variants like.jsonand trailing slashes - Pre-decode encoded paths with
decodeURIComponent - Use
getAll("mode").includes(...)to prevent bypasses via duplicate query parameters like?mode=normal&mode=heavy(accounting for the interpretation difference where Rack uses the last value andURLSearchParams.getreturns the first value)
Something discovered during testing: the Workers runtime's URLPattern does not decode percent-encoded paths and performs raw comparisons. Therefore, pre-normalization with decodeURIComponent is essential.
Final Configuration
I ultimately re-implemented rate limiting on the backend side (application + RDB). Using the existing DB as-is means almost no additional infrastructure or costs are incurred. The processing also only requires a single DB query equivalent to INSERT ... ON CONFLICT DO UPDATE, so the overhead per request is only a single DB request.
The tradeoff is that unlike an edge-stopping configuration, abuse traffic still reaches the backend. Therefore, large-scale requests such as DoS/DDoS still require rate limiting via Cloudflare rules.
Summary
Through this investigation, I concluded that the following distinctions are appropriate depending on purpose and scale:
- Fine-grained rate limiting per feature: Implement in the backend application layer. It can be accomplished entirely within the existing stack with no additional infrastructure or charges.
- Large-scale access / DoS countermeasures: Stop at Cloudflare's Rate Limiting Rules (WAF layer). Since neither Workers / DO / backend starts up, this is also the most cost-efficient option.
- Implementation with Workers / Durable Objects: An option for cases where modifying the backend is difficult. If strict control is required, a DO + SQLite + Alarm configuration is necessary; otherwise, backend or WAF is more advantageous.
In this case, since the goal was fine-grained rate limiting per feature, I ultimately adopted the backend-side implementation.