I researched how to introduce rate limiting to a site proxied through Cloudflare

I verified three patterns for IP-based rate limiting on sites proxied through Cloudflare — Workers Rate Limiting Binding, Durable Objects (in-memory / SQLite) — and summarized the background behind ultimately adopting a backend implementation.

Ryosuke IGARASHI

2026.05.22

This page has been translated by machine translation. View original

 IntroductionA need arose for IP-based rate limiting on a specific API endpoint (one that calls a high-cost external service per request). As the implementation location, I thought it would be best to stop requests at the edge (Cloudflare), and started my investigation from there.
The first thing I considered was Cloudflare's WAF Rate Limiting Rules. However, the Pro plan I was using ($20/month) limits Rate Limiting Rules to just 2, and those were already filled with existing rules. Increasing the quota would require upgrading to the Business plan ($200/month), but raising costs to 10x just for 1 additional rule wasn't realistic.
So I decided to explore alternative approaches to implement rate limiting while keeping costs down.
Since the target domain was already proxied through Cloudflare, using Workers Routes would allow attaching a Worker per route pattern like example.com/api/example*, making only the target endpoint go through the Worker. Not needing to route the entire site through a Worker and being able to limit the scope of impact seemed convenient.
Cloudflare offers multiple rate limiting options available from Workers:
Workers Rate Limiting Binding (official built-in)
Durable Object (DO)
2a. Count with in-memory state
2b. Persist with SQLite-backed storage

I tried each in order, but each had caveats. I ultimately re-implemented on the backend side (application + RDB), but I'll organize the knowledge gained along the way.
 RequirementsTarget the specific endpoint /api/example?mode=heavy
Allow up to 10 requests per IP within 60 seconds
Return HTTP 429 when exceeded
 Pattern 1. Workers Rate Limiting BindingThis is Cloudflare's official built-in feature, available simply by writing configuration in wrangler.toml.
 Implementation# wrangler.toml
[[ratelimits]]
name = "MY_LIMITER"
namespace_id = "1001"

  [ratelimits.simple]
  limit = 10
  period = 60   # Only 10 or 60 can be specified
// src/index.ts
export default {
  async fetch(req, env) {
    const key = req.headers.get("cf-connecting-ip") ?? "unknown";
    const { success } = await env.MY_LIMITER.limit({ key });
    if (!success) {
      return new Response("Too Many Requests", { status: 429 });
    }
    return fetch(req);
  },
};
The implementation only takes a few lines, making it extremely simple.
 Verification ResultsAfter deploying to a test environment and sending 100 req / 10 seconds (several dozen times the configured value) from a single IP, a large discrepancy was observed between expected and actual values.
Expected: 10 successes, 90 with success: false
Actual: Only a few success: false out of 100 requests
Against the configured value of 10 req / 60s, the effective behavior allowed several dozen to ~100 req/min through.
 CauseThe official documentation (Workers Rate Limiting - Accuracy) states the following:
The Rate Limiting API is permissive, eventually consistent, and intentionally designed to not be used as an accurate accounting system.

...the isolate that serves each request will check against its locally cached value of the rate limit. Very quickly, but not immediately...
Source: https://developers.cloudflare.com/workers/runtime-apis/bindings/rate-limit/#accuracy
In summary:
Each isolate maintains a counter in its local cache
Synchronization between isolates has a delay of several seconds
When multiple isolates exist within the same PoP, requests are distributed, causing each isolate's local count to be spread out
Cloudflare itself explicitly states it "should not be used as an accurate accounting system"
This made it clear that it doesn't match use cases requiring accuracy matching the configured values.
 CostThere is no additional charge for Rate Limiting Binding itself. You are only charged for Workers request count and CPU time.


Plan
Requests
CPU Time


Workers Free
100,000 / day
10ms / invocation

Workers Paid ($5/month)
10M / month included, excess $0.30 / 1M req
30s / invocation, additional charges apply

 ConsiderationsWorkers Rate Limiting Binding is not suited for strict abuse prevention
It functions well enough as a loose filter, but for use cases requiring control matching the configured values, a different approach is needed
 Pattern 2. Durable Object (in-memory)Durable Objects (DO) operate as "logically single instances," making it possible to aggregate and count requests from multiple PoPs. This pattern maintains counters in in-memory state.
Note that to use Durable Objects on the Workers Free plan, the DO class must be declared as new_sqlite_classes (the legacy Key-Value-backed option requires the Workers Paid plan). In this pattern, the DO class is declared as SQLite-backed, but the code operates using only in-memory state without calling ctx.storage.
 ImplementationIn wrangler.toml, declare it as SQLite-backed.
[[durable_objects.bindings]]
name = "RATE_LIMITER"
class_name = "RateLimiter"

[[migrations]]
tag = "v1"
new_sqlite_classes = ["RateLimiter"]   # Required for Free plan compatibility
The design creates a DO per IP (idFromName(ip)). The counter is held in-memory as an instance variable of the class.
// rate_limiter.ts
export class RateLimiter extends DurableObject {
  private timestamps: number[] = [];

  async checkLimit() {
    const now = Date.now();
    this.timestamps = this.timestamps.filter(t => now - t < 60_000);
    if (this.timestamps.length >= 10) return { success: false };
    this.timestamps.push(now);
    return { success: true };
  }
}
// index.ts (Worker main)
const id = env.RATE_LIMITER.idFromName(ip);
const stub = env.RATE_LIMITER.get(id);
const { success } = await stub.checkLimit();
Since the storage API is not called, no SQLite rows read/written charges are incurred.
 Verification 1. Burst BehaviorWhen sending 11 requests consecutively, the 11th request returned 429 as expected.
1–10: HTTP 200
11–15: HTTP 429
 Verification 2. With Idle PeriodRequests were sent under the condition of 5 requests → 15 seconds idle → 6 requests. If a normal sliding window were functioning, the 5 timestamps from Phase 1 would remain in the window since they are within 60 seconds, and the 6th request in Phase 2 (11th total) should return 429.
The actual results were as follows:
Phase 1: all 5 requests returned 200
15 seconds idle
Phase 2: all 6 requests returned 200
This shows that the Phase 1 count was lost. In other words, inserting an idle period allowed the rate limit to be bypassed.
 CauseThe DO lifecycle is responsible. The official DO Lifecycle documentation states:
after 10 seconds of inactivity while in this state.

When hibernated, the in-memory state is discarded
Source: https://developers.cloudflare.com/durable-objects/concepts/durable-object-lifecycle/
Under certain conditions, a DO hibernates after 10 seconds of idle and the in-memory state is discarded. Upon the next access, it restarts from constructor(), resetting the counter to its initial state.
From an attacker's perspective, the following bypass is therefore possible:
Send bursts of 9 requests with intervals of about 10–15 seconds
Since the counter resets each time, approximately 36–54 requests per minute pass through (4–5x the configured value)
Since this bypass is viable, it is not practical as an abuse prevention measure.
 ConsiderationsDO in-memory state must be designed with the assumption that it will be lost
Short bursts can be prevented, but sustained access with intervals cannot be handled
 Pattern 3. Durable Object (SQLite-backed persistent)Switching to a policy of persisting timestamps using the DO Storage API. Since it has already been created as SQLite-backed, using the KV API (ctx.storage.get/put) will write to SQLite internally.
 Implementationexport class RateLimiter extends DurableObject {
  async checkLimit() {
    const now = Date.now();
    const stored = (await this.ctx.storage.get<number[]>("ts")) ?? [];
    const timestamps = stored.filter(t => now - t < 60_000);
    if (timestamps.length >= 10) {
      return { success: false };   // Don't write on block (suppress write charges)
    }
    timestamps.push(now);
    await this.ctx.storage.put("ts", timestamps);
    return { success: true };
  }
}
Since DOs operate single-threaded (input gates), there is no possibility of another request interrupting between get → compute → put, ensuring atomicity.
 Verification ResultsAll verification patterns produced the expected results.


Verification
Result


429 from 11th burst request
OK

Counter maintained across hibernate after 15 seconds idle
OK

Counter entries outside the window are excluded
OK

 Additional ChallengeDO storage has no TTL. Storage for IPs that never revisit remains unless explicitly deleted. Cloudflare DO Storage has no built-in TTL feature, so data written with storage.put does not expire automatically.
 SolutionImplement TTL-equivalent behavior using the Alarm API. DO Alarms are a feature that calls a method at a specified time, which can be used to implement TTL-equivalent behavior.
async checkLimit() {
  // ... existing logic ...
  await this.ctx.storage.put("ts", timestamps);
  // Clean up 65 seconds after the last request.
  // Each call to setAlarm overwrites with the latest time,
  // so it won't fire during continuous access, only after idle.
  await this.ctx.storage.setAlarm(now + 60_000 + 5_000);
}

async alarm() {
  await this.ctx.storage.deleteAll();
}
This automatically cleans up DO storage for IPs that have had no access for 65 seconds after their last request.
 CostDurable Objects can be used on both the Workers Free and Paid plans, but to use them on Workers Free they must be created as SQLite-backed (new_sqlite_classes) (the legacy Key-Value-backed option requires Paid).


Billable Item
Workers Free
Workers Paid


DO requests
100,000 / day
1M / month included, excess $0.15 / 1M req

Duration (execution time × 128MB)
13,000 GB-s / day
400,000 GB-s / month included, excess $12.50 / 1M GB-s

SQLite rows read
5M / day
25B / month included, excess $0.001 / 1M rows

SQLite rows written
100,000 / day
50M / month included, excess $1.00 / 1M rows

SQLite storage
5GB (storage capacity limit)
5GB-month included, excess $0.20 / GB-month

The consumption per request in this implementation is as follows (setAlarm internally involves a SQLite write, so it counts as rows written).


Operation
rows read
rows written


On success
1
2 (put + setAlarm)

On block
1
0

Writes are the tightest in terms of quota, and when operating on the Workers Free plan, approximately 50,000 successes/day is the upper limit (100K writes ÷ 2 writes/req). Since the design avoids generating writes on block, it's an advantage that charges don't scale proportionally even under large-scale abuse.
In addition, the Worker's own request count (100K/day Free) is consumed separately from DO, so it's also necessary to check whether the total traffic to the target endpoint exceeds this quota. Whether operation within the Workers Free plan quota is possible must be determined by comparing the traffic volume of the target endpoint against the quotas above.
 Additional Considerations When Implementing in a WorkerThis is about path normalization. Since I ultimately re-implemented on the backend side, this configuration was not adopted, but I'll leave this note about the path judgment pitfall discovered during testing as something to be aware of when implementing rate limiting in a Worker.
Implementing with the assumption that only /api/example?mode=heavy is the target can result in bypasses.
In backend routing (for example, Rails resources-style routing), the following are all conventionally routed to the same action:
/api/example
/api/example.json (format automatically allowed)
/api/example/ (trailing slash allowed)
/api/example%2ejson (percent-encoded . is also decoded and interpreted as .json)
If the Worker's judgment uses a simple comparison like url.pathname === "/api/example", percent-encoded variants like %2ejson pass through, and the rate-limited action ends up being executed on the backend. I confirmed that a bypass was actually achieved in the test environment.
 SolutionCombine URLPattern and decodeURIComponent to include all variant paths in the judgment target.
const TARGET_PATTERN = new URLPattern({
  pathname: "/api/example{.:format}?{/}?",
});

const isTarget = (url: URL): boolean => {
  let pathname = url.pathname;
  try {
    pathname = decodeURIComponent(url.pathname);
  } catch {
    // Invalid % encoding is judged using the original pathname
  }
  return TARGET_PATTERN.test({ pathname }) &&
    url.searchParams.getAll("mode").includes("heavy");
};
Use URLPattern to declaratively express variants like .json and trailing slashes
Pre-decode encoded paths with decodeURIComponent
Use getAll("mode").includes(...) to prevent bypasses via duplicate query parameters like ?mode=normal&mode=heavy (accounting for the interpretation difference where Rack uses the last value and URLSearchParams.get returns the first value)
Something discovered during testing: the Workers runtime's URLPattern does not decode percent-encoded paths and performs raw comparisons. Therefore, pre-normalization with decodeURIComponent is essential.
 Final ConfigurationI ultimately re-implemented rate limiting on the backend side (application + RDB). Using the existing DB as-is means almost no additional infrastructure or costs are incurred. The processing also only requires a single DB query equivalent to INSERT ... ON CONFLICT DO UPDATE, so the overhead per request is only a single DB request.
The tradeoff is that unlike an edge-stopping configuration, abuse traffic still reaches the backend. Therefore, large-scale requests such as DoS/DDoS still require rate limiting via Cloudflare rules.
 SummaryThrough this investigation, I concluded that the following distinctions are appropriate depending on purpose and scale:
Fine-grained rate limiting per feature: Implement in the backend application layer. It can be accomplished entirely within the existing stack with no additional infrastructure or charges.
Large-scale access / DoS countermeasures: Stop at Cloudflare's Rate Limiting Rules (WAF layer). Since neither Workers / DO / backend starts up, this is also the most cost-efficient option.
Implementation with Workers / Durable Objects: An option for cases where modifying the backend is difficult. If strict control is required, a DO + SQLite + Alarm configuration is necessary; otherwise, backend or WAF is more advantageous.
In this case, since the goal was fine-grained rate limiting per feature, I ultimately adopted the backend-side implementation.

I researched how to introduce rate limiting to a site proxied through Cloudflare

Introduction

Requirements

Pattern 1. Workers Rate Limiting Binding

Implementation

Verification Results

Cause

Cost

Considerations

Pattern 2. Durable Object (in-memory)

Implementation

Verification 1. Burst Behavior

Verification 2. With Idle Period

Cause

Considerations

Pattern 3. Durable Object (SQLite-backed persistent)

Implementation

Verification Results

Additional Challenge

Solution

Cost

Additional Considerations When Implementing in a Worker

Solution

Final Configuration

Summary

AWS Topics

Trending Topics

Products & Services

Features and Series

Plan	Requests	CPU Time
Workers Free	100,000 / day	10ms / invocation
Workers Paid ($5/month)	10M / month included, excess $0.30 / 1M req	30s / invocation, additional charges apply

Verification	Result
429 from 11th burst request	OK
Counter maintained across hibernate after 15 seconds idle	OK
Counter entries outside the window are excluded	OK

Billable Item	Workers Free	Workers Paid
DO requests	100,000 / day	1M / month included, excess $0.15 / 1M req
Duration (execution time × 128MB)	13,000 GB-s / day	400,000 GB-s / month included, excess $12.50 / 1M GB-s
SQLite rows read	5M / day	25B / month included, excess $0.001 / 1M rows
SQLite rows written	100,000 / day	50M / month included, excess $1.00 / 1M rows
SQLite storage	5GB (storage capacity limit)	5GB-month included, excess $0.20 / GB-month