> **Building with AI coding agents?** Install the authstack plugin with one command. This equips your agent with accurate Scalekit implementation patterns.
>
> **Recommended**:
> ```bash
> npx @scalekit-inc/cli setup
> ```
>
> Global:
> ```bash
> npm install -g @scalekit-inc/cli
> scalekit setup
> ```
>
> Supports Claude Code, Cursor, GitHub Copilot, Codex + skills for 40+ agents.
> Features: full-stack-auth, agent-auth, mcp-auth, modular-sso, modular-scim.
> [Full setup guide](https://docs.scalekit.com/dev-kit/build-with-ai/)

---

# API rate limits

Understand Scalekit API rate limits, tell them apart from upstream provider limits, handle 429 responses, and request a higher limit.
Scalekit applies a rate limit to the API requests from each environment. When a workload exceeds that limit, the API responds with HTTP `429 Too Many Requests`. This page explains how to recognize a rate-limit response, tell a Scalekit limit apart from an upstream provider's limit, handle the response, and request a higher limit.

## How rate limits work

Scalekit enforces a per-environment request rate, measured in requests per minute. Scalekit tunes the limit per account, so high-throughput workloads can need a higher limit than the default. Routing MCP tool calls through Scalekit on top of authentication traffic is one example of a workload that can need more headroom.

When you exceed the limit, Scalekit returns HTTP `429 Too Many Requests`. Back off and retry with exponential backoff rather than retrying immediately.

## Tell Scalekit limits apart from provider limits

When you call tools through Scalekit, a `429` can come from either Scalekit or the upstream provider that Scalekit calls on your behalf, such as a CRM or data API. The `error_code` field on the error identifies the source:

| `error_code` | Source | What to do |
|---|---|---|
| `RATE_LIMITED` | Scalekit's own rate limit | Reduce the overall request frequency and back off before retrying. |
| `TOOL_ERROR` | The upstream provider rate-limited the tool call | Apply the provider's recommended backoff. Check the tool call logs for the provider's message. |

Review the detailed error in your dashboard's tool call logs to confirm which provider and which tool triggered the limit.

## Handle a 429 response

Every Scalekit SDK raises a dedicated exception when the API returns `429`. Catch it, read the error code to determine the source, and back off before retrying.

### Node.js

```ts title="rate-limit-handling.ts"

try {
  // Your Scalekit SDK call, for example executing a tool
  await scalekit.tools.executeTool(/* ... */);
} catch (error) {
  if (error instanceof ScalekitTooManyRequestsException) {
    // errorCode identifies the source of the 429 so you can back off correctly
    if (error.errorCode === 'TOOL_ERROR') {
      // Upstream provider rate-limited the call: apply provider-specific backoff
      console.error('Provider rate limit:', error.message);
    } else {
      // Scalekit's own rate limit (RATE_LIMITED): reduce overall request frequency
      console.error('Scalekit rate limit:', error.message);
    }
  }
}
```

### Python

```py title="rate_limit_handling.py"
from scalekit.common.exceptions import ScalekitTooManyRequestsException

try:
    result = scalekit_client.tools.execute_tool(...)
except ScalekitTooManyRequestsException as e:
    # error_code identifies the source of the 429 so you can back off correctly
    if e.error_code == "TOOL_ERROR":
        # Upstream provider rate-limited the call: apply provider-specific backoff
        print("Provider rate limit:", e.message)
    else:
        # Scalekit's own rate limit (RATE_LIMITED): reduce overall request frequency
        print("Scalekit rate limit:", e.message)
```

### Go

```go title="rate_limit_handling.go"
// Your Scalekit SDK call, for example executing a tool
_, err := scalekitClient.Tools.ExecuteTool(ctx /* ... */)
if err != nil {
    // Inspect the error code to find the source of the 429.
    // "TOOL_ERROR" is the upstream provider's limit; "RATE_LIMITED" is Scalekit's
    // own limit. Back off with exponential backoff before retrying.
    log.Printf("rate limited: %v", err)
}
```

### Java

```java title="RateLimitHandling.java"
try {
    // Your Scalekit SDK call, for example executing a tool
    scalekitClient.tools().executeTool(/* ... */);
} catch (ScalekitException error) {
    // Inspect the error code to find the source of the 429.
    // "TOOL_ERROR" is the upstream provider's limit; "RATE_LIMITED" is Scalekit's
    // own limit. Back off with exponential backoff before retrying.
    System.err.println("Rate limited: " + error.getMessage());
}
```

> note: Use exponential backoff
>
> Retrying a rate-limited request immediately keeps you over the limit. Use exponential backoff with jitter, and cap the number of retries so a sustained limit doesn't block your application indefinitely.

## Request a higher limit

If your workload needs a higher limit, contact Scalekit support with your account details and your expected peak requests per minute. Plan ahead before you route additional traffic, such as MCP tool calls, through Scalekit. Scalekit reviews the request and adjusts the limit for your account.

When you estimate the limit you need, include headroom above your current peak so that normal spikes do not trigger `429` responses.


---

## More Scalekit documentation

| Resource | What it contains | When to use it |
|----------|-----------------|----------------|
| [/llms.txt](/llms.txt) | Structured index with routing hints per product area | Start here — find which documentation set covers your topic before loading full content |
| [/llms-full.txt](/llms-full.txt) | Complete documentation for all Scalekit products in one file | Use when you need exhaustive context across multiple products or when the topic spans several areas |
| [sitemap-0.xml](https://docs.scalekit.com/sitemap-0.xml) | Full URL list of every documentation page | Use to discover specific page URLs you can fetch for targeted, page-level answers |
