Skip to content

feat: add request-local rate limiter#670

Merged
nyannyacha merged 5 commits intomainfrom
ny/feat-request-local-rate-limiter
Mar 3, 2026
Merged

feat: add request-local rate limiter#670
nyannyacha merged 5 commits intomainfrom
ny/feat-request-local-rate-limiter

Conversation

@nyannyacha
Copy link
Contributor

@nyannyacha nyannyacha commented Mar 3, 2026

What kind of change does this PR introduce?

Feature

Towards FUNC-479

Description

Adds a request-local outbound HTTP rate limiter that limits the number of outbound HTTP requests a worker can make, keyed by the W3C trace context. This prevents runaway circular fetch chains (A → B → A → …) and recursive self-calling loops from running indefinitely.

How it works

When an inbound request arrives, the runtime extracts the trace ID from the traceparent header and stores it in an AsyncVariable. Every subsequent outbound HTTP request made by that worker — whether through fetch or node:http — reads this value and increments a counter against it.

  • Traced requests (inbound request carries a traceparent header): budget is tracked per trace ID (local budget). All outbound requests within the same trace share one counter.
  • Untraced requests (no traceparent header): budget is tracked by the key supplied at worker creation time (global budget). All user workers of the same function share one counter regardless of how many are running (per process).

When the budget is exhausted, fetch() and node:http requests throw Deno.errors.RateLimitError.

Usage

Configure traceRateLimitOptions when creating a user worker from the main worker script. To have traceparent automatically propagated into outbound requests (so downstream workers can participate in the same trace), also enable the OTel TraceContext propagator.

// main worker
const worker = await EdgeRuntime.userWorkers.create({
  servicePath,
  // ...

  // Required for automatic traceparent propagation to outbound requests.
  otelConfig: {
    tracing_enabled: true,
    propagators: ["TraceContext" /*, "Baggage" */],
  },

  traceRateLimitOptions: {
    // Stable identifier shared across all user workers of the same function.
    // Used as the rate-limit key for untraced requests.
    // Typically set to the servicePath (or deployment id).
    key: servicePath,
    rules: [
      {
        matches: ".*", // regex matched against the outbound request URL
        ttl: 60, // budget reset interval in seconds
        budget: {
          local: 20, // max outbound requests per trace ID within the TTL
          global: 50, // max outbound requests across all untraced calls within the TTL
        },
      },
    ],
  },
});

Newly introduced CLI Flags

Flag Description Default
--rate-limit-table-cleanup-interval <SECONDS> How often expired rate-limit entries are swept from the shared table 60

@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a runtime, trace-aware outbound HTTP rate limiter and plumbing. Introduces a Rust rate_limit module with SharedRateLimitTable, TraceRateLimiter, rule compilation, TTL-based local/global budgets, and a background cleanup task (interval configurable via new CLI flag --rate-limit-table-cleanup-interval). Wires rate limiter options through worker creation and the worker pool, exposes per-request trace context utilities to JS, integrates op_check_outbound_rate_limit into fetch and Node HTTP polyfills, and adds test workers and integration tests for circular limits and trace isolation.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client Request
    participant HTTP as HTTP Layer (fetch/node polyfills)
    participant Runtime as Deno Runtime / OpState
    participant RateLimit as TraceRateLimiter
    participant Table as SharedRateLimitTable
    participant Cleanup as Cleanup Task

    Client->>HTTP: Make outbound HTTP request (includes traceparent?)
    HTTP->>Runtime: op_check_outbound_rate_limit(url, key, is_traced)
    Runtime->>RateLimit: lookup limiter from OpState
    alt No limiter configured
        RateLimit-->>Runtime: allow (true)
    else Limiter present
        RateLimit->>RateLimit: find matching rule by URL
        alt Traced request
            RateLimit->>Table: check_and_increment(key, local_budget, ttl)
        else Untraced request (global)
            RateLimit->>Table: check_and_increment(global_key, global_budget, ttl)
        end
        Table-->>RateLimit: allowed? (true/false)
        RateLimit-->>Runtime: allowed? (true/false)
    end
    Runtime-->>HTTP: result
    alt allowed
        HTTP->>Client: proceed with request/response
    else denied
        HTTP-->>Client: throw Deno.errors.RateLimitError
    end

    par Background
        Cleanup->>Table: periodically remove expired entries (interval from CLI)
    end
Loading

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nyannyacha nyannyacha force-pushed the ny/feat-request-local-rate-limiter branch from c464281 to 2d95607 Compare March 3, 2026 09:28
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
vendor/deno_fetch/26_fetch.js (1)

392-413: ⚠️ Potential issue | 🟠 Major

Check abort state before charging rate-limit budget

The new rate-limit check runs before the existing abort guard (Line 410-Line 413). For already-aborted requests, this can still consume quota and may return RateLimitError instead of the expected abort path. Move the rate-limit check after the requestObject.signal.aborted branch.

Suggested fix
-      // Rate limit check
-      const traceId = internals.getRequestTraceId?.();
-      const isTraced = traceId !== null && traceId !== undefined;
-      const rlKey = isTraced ? traceId : "";
-      const allowed = op_check_outbound_rate_limit(
-        requestObject.url,
-        rlKey,
-        isTraced,
-      );
-      if (!allowed) {
-        const msg = isTraced
-          ? `Rate limit exceeded for trace ${rlKey}`
-          : `Rate limit exceeded for function`;
-        reject(new Deno.errors.RateLimitError(msg));
-        return;
-      }
-
       // 4.
       if (requestObject.signal.aborted) {
         reject(abortFetch(request, null, requestObject.signal.reason));
         return;
       }
+
+      // Rate limit check
+      const traceId = internals.getRequestTraceId?.();
+      const isTraced = traceId !== null && traceId !== undefined;
+      const rlKey = isTraced ? traceId : "";
+      const allowed = op_check_outbound_rate_limit(
+        requestObject.url,
+        rlKey,
+        isTraced,
+      );
+      if (!allowed) {
+        const msg = isTraced
+          ? `Rate limit exceeded for trace ${rlKey}`
+          : "Rate limit exceeded for function";
+        reject(new Deno.errors.RateLimitError(msg));
+        return;
+      }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vendor/deno_fetch/26_fetch.js` around lines 392 - 413, The rate-limit check
currently runs before the abort guard and can consume quota or throw
Deno.errors.RateLimitError for already-aborted requests; move the
op_check_outbound_rate_limit block (including internals.getRequestTraceId usage
and the reject(new Deno.errors.RateLimitError(...)) branch) so it executes after
the abort check that uses requestObject.signal.aborted and abortFetch(request,
null, requestObject.signal.reason), ensuring you first bail out on aborted
signals via abortFetch and only then charge/check rate limits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/src/flags.rs`:
- Around line 235-243: The CLI flag --rate-limit-table-cleanup-interval
currently accepts 0 which can cause a hot loop; update the arg definition for
"rate-limit-table-cleanup-interval" in flags.rs to validate and reject zero
(require value >= 1) by adding a value validator or range check (e.g., use
clap's value_parser with a .range(1..) or a custom validator closure that
returns an error for "0"), and also audit any non-CLI default sources/constants
used for the same interval to ensure they never supply 0 (replace with >=1
default, e.g., 60).

In `@cli/src/main.rs`:
- Around line 232-235: The value assigned to rate_limit_cleanup_interval_sec can
be 0, which results in tokio::time::sleep(Duration::ZERO) and a tight spin;
guard against zero by validating or clamping the value before passing to
spawn_cleanup_task: after reading
sub_matches.get_one::<u64>("rate-limit-table-cleanup-interval").copied().unwrap_or(60)
ensure you replace or coerce zero to a safe minimum (e.g., 1) (for example, set
rate_limit_cleanup_interval_sec = if val == 0 { 1 } else { val } or use
val.max(1)); update any callers (spawn_cleanup_task) to rely on this non-zero
invariant. Ensure the check references rate_limit_cleanup_interval_sec and
avoids calling tokio::time::sleep with Duration::ZERO.

In `@crates/base/src/worker/worker_inner.rs`:
- Around line 275-286: The code currently swallows TraceRateLimiter::new
compilation errors (in the RateLimiterOpts::Configured branch when
new_runtime.conf.as_user_worker().cloned() is Some), which disables outbound
rate limiting; change the Err arm to fail the worker bootstrap instead of
logging: propagate or return the error from the surrounding worker
initialization function (convert/box the TraceRateLimiter::new error into the
function's error type and return it) so that a rate-limit compilation failure
aborts startup rather than continuing without a limiter.

In `@ext/runtime/js/http.js`:
- Around line 267-272: The RAW upgrade path reads fenceRid from the request tag
via getSupabaseTag(requestEvent.request) and throws if it's undefined, but the
tag initialization only sets watcherRid and streamRid; update the tag
initialization (where watcherRid and streamRid are created) to include fenceRid:
null so fenceRid is always present, or implement the missing flow that sets
fenceRid on the tag before the RAW upgrade code path executes; ensure the symbol
names referenced are getSupabaseTag, requestEvent.request, and
internals.RAW_UPGRADE_RESPONSE_SENTINEL so the fix targets the correct tag
creation site and the RAW upgrade check.

---

Outside diff comments:
In `@vendor/deno_fetch/26_fetch.js`:
- Around line 392-413: The rate-limit check currently runs before the abort
guard and can consume quota or throw Deno.errors.RateLimitError for
already-aborted requests; move the op_check_outbound_rate_limit block (including
internals.getRequestTraceId usage and the reject(new
Deno.errors.RateLimitError(...)) branch) so it executes after the abort check
that uses requestObject.signal.aborted and abortFetch(request, null,
requestObject.signal.reason), ensuring you first bail out on aborted signals via
abortFetch and only then charge/check rate limits.

ℹ️ Review info

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9f0294e and c464281.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (22)
  • cli/src/flags.rs
  • cli/src/main.rs
  • crates/base/src/server.rs
  • crates/base/src/worker/pool.rs
  • crates/base/src/worker/worker_inner.rs
  • crates/base/test_cases/rate-limit-a/index.ts
  • crates/base/test_cases/rate-limit-b/index.ts
  • crates/base/test_cases/rate-limit-echo/index.ts
  • crates/base/test_cases/rate-limit-main/index.ts
  • crates/base/test_cases/rate-limit-untraced/index.ts
  • crates/base/tests/integration_tests.rs
  • ext/node/polyfills/http.ts
  • ext/runtime/Cargo.toml
  • ext/runtime/js/bootstrap.js
  • ext/runtime/js/errors.js
  • ext/runtime/js/http.js
  • ext/runtime/js/request_context.js
  • ext/runtime/lib.rs
  • ext/runtime/rate_limit.rs
  • ext/workers/context.rs
  • ext/workers/lib.rs
  • vendor/deno_fetch/26_fetch.js

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
ext/runtime/js/http.js (1)

77-80: ⚠️ Potential issue | 🔴 Critical

RAW upgrade can throw because fenceRid is not initialized on the request tag.

The tag is created without fenceRid, but the RAW upgrade branch requires it and throws when missing.

Proposed fix
     nextRequest.request[kSupabaseTag] = {
       watcherRid,
       streamRid: nextRequest.streamRid,
+      fenceRid: null,
     };
#!/bin/bash
set -euo pipefail

# Verify whether fenceRid is ever initialized/populated before RAW upgrade usage.
rg -n 'fenceRid|kSupabaseTag|RAW_UPGRADE_RESPONSE_SENTINEL' --type js -C2

Also applies to: 267-272

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ext/runtime/js/http.js` around lines 77 - 80, The request tag created at
nextRequest.request[kSupabaseTag] is missing the fenceRid property which the RAW
upgrade branch (checking RAW_UPGRADE_RESPONSE_SENTINEL) expects and throws on;
update the tag initialization to always include a fenceRid field (e.g., set to
the existing fenceRid value if present or a safe default like null/undefined/0)
alongside watcherRid and streamRid so RAW upgrade code can read it safely; apply
the same initialization fix for the other tag creation site referenced by
fenceRid usage (the occurrence around lines 267-272) so both places consistently
populate fenceRid.
cli/src/main.rs (1)

232-235: ⚠️ Potential issue | 🔴 Critical

Reject 0 for cleanup interval to avoid a hot cleanup loop.

rate_limit_cleanup_interval_sec currently accepts 0; this can propagate to the cleanup task interval and cause aggressive polling.

Proposed fix
         let rate_limit_cleanup_interval_sec = sub_matches
           .get_one::<u64>("rate-limit-table-cleanup-interval")
           .copied()
           .unwrap_or(60);
+        if rate_limit_cleanup_interval_sec == 0 {
+          bail!("--rate-limit-table-cleanup-interval must be >= 1 second");
+        }
#!/bin/bash
set -euo pipefail

# Verify zero is currently accepted in CLI parsing and propagated to cleanup task.
rg -n 'rate-limit-table-cleanup-interval|rate_limit_cleanup_interval_sec|spawn_cleanup_task|tokio::time::sleep' --type rust -C2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/src/main.rs` around lines 232 - 235, The parsed
rate_limit_cleanup_interval_sec currently allows 0 which can cause a tight
cleanup loop; update the parsing logic where
sub_matches.get_one::<u64>("rate-limit-table-cleanup-interval") is handled so
that a value of 0 is rejected or coerced to a safe minimum (e.g. error out with
a user-facing message or replace 0 with 1 using .map(|v| if *v == 0 {1} else
{*v}) or .copied().map(|v| v.max(1)).unwrap_or(60)); ensure spawn_cleanup_task
and any tokio::time::sleep calls use this sanitized value so 0 cannot propagate
into the cleanup interval.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cli/src/main.rs`:
- Around line 232-235: The parsed rate_limit_cleanup_interval_sec currently
allows 0 which can cause a tight cleanup loop; update the parsing logic where
sub_matches.get_one::<u64>("rate-limit-table-cleanup-interval") is handled so
that a value of 0 is rejected or coerced to a safe minimum (e.g. error out with
a user-facing message or replace 0 with 1 using .map(|v| if *v == 0 {1} else
{*v}) or .copied().map(|v| v.max(1)).unwrap_or(60)); ensure spawn_cleanup_task
and any tokio::time::sleep calls use this sanitized value so 0 cannot propagate
into the cleanup interval.

In `@ext/runtime/js/http.js`:
- Around line 77-80: The request tag created at
nextRequest.request[kSupabaseTag] is missing the fenceRid property which the RAW
upgrade branch (checking RAW_UPGRADE_RESPONSE_SENTINEL) expects and throws on;
update the tag initialization to always include a fenceRid field (e.g., set to
the existing fenceRid value if present or a safe default like null/undefined/0)
alongside watcherRid and streamRid so RAW upgrade code can read it safely; apply
the same initialization fix for the other tag creation site referenced by
fenceRid usage (the occurrence around lines 267-272) so both places consistently
populate fenceRid.

ℹ️ Review info

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c464281 and b8a5f1b.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (23)
  • cli/src/flags.rs
  • cli/src/main.rs
  • crates/base/src/server.rs
  • crates/base/src/worker/pool.rs
  • crates/base/src/worker/worker_inner.rs
  • crates/base/test_cases/rate-limit-a/index.ts
  • crates/base/test_cases/rate-limit-b/index.ts
  • crates/base/test_cases/rate-limit-echo/index.ts
  • crates/base/test_cases/rate-limit-main/index.ts
  • crates/base/test_cases/rate-limit-untraced/index.ts
  • crates/base/tests/integration_tests.rs
  • ext/node/polyfills/http.ts
  • ext/runtime/Cargo.toml
  • ext/runtime/js/bootstrap.js
  • ext/runtime/js/errors.js
  • ext/runtime/js/http.js
  • ext/runtime/js/request_context.js
  • ext/runtime/lib.rs
  • ext/runtime/rate_limit.rs
  • ext/workers/context.rs
  • ext/workers/lib.rs
  • types/global.d.ts
  • vendor/deno_fetch/26_fetch.js
🚧 Files skipped from review as they are similar to previous changes (13)
  • crates/base/test_cases/rate-limit-b/index.ts
  • crates/base/src/worker/worker_inner.rs
  • ext/workers/context.rs
  • ext/runtime/js/bootstrap.js
  • ext/runtime/js/errors.js
  • crates/base/test_cases/rate-limit-main/index.ts
  • crates/base/test_cases/rate-limit-untraced/index.ts
  • ext/workers/lib.rs
  • ext/runtime/js/request_context.js
  • crates/base/src/server.rs
  • crates/base/test_cases/rate-limit-a/index.ts
  • ext/node/polyfills/http.ts
  • cli/src/flags.rs

@nyannyacha nyannyacha force-pushed the ny/feat-request-local-rate-limiter branch from 59a49ce to d1882c5 Compare March 3, 2026 09:51
@nyannyacha nyannyacha requested a review from a team March 3, 2026 10:08
@nyannyacha nyannyacha merged commit bdf1893 into main Mar 3, 2026
5 checks passed
@nyannyacha nyannyacha deleted the ny/feat-request-local-rate-limiter branch March 3, 2026 13:03
Copy link

@jasonradford115-ui jasonradford115-ui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y

@jasonradford115-ui
Copy link

Y

Copy link

@jasonradford115-ui jasonradford115-ui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ****

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants