☁️Cloudflare Blog·February 20, 2026

Context Window Optimization for AI Agents with Code Mode

Cloudflare's Code Mode addresses the challenge of large context window consumption in AI agents interacting with vast APIs. By enabling agents to write and execute code against a typed SDK and API specification, it drastically reduces token usage, offering a more efficient and scalable way for agents to discover and utilize API functionalities. This approach leverages a server-side execution environment for enhanced security and fixed token cost, regardless of API size.

AI & ML Infrastructure API Design Distributed Systems

Read original on Cloudflare Blog

The integration of AI agents with external tools via Model Context Protocol (MCP) often faces a critical limitation: the size of the model's context window. As the number of available tools or API endpoints grows, the context window quickly becomes saturated, leaving less room for the agent's actual task and increasing operational costs. Cloudflare's Code Mode offers an architectural solution to this problem, particularly for large and evolving APIs like their own.

Code Mode: A Paradigm Shift for Agent-API Interaction

Traditional MCP approaches typically involve describing each API operation as a separate tool, leading to an explosion of tokens in the context window. Code Mode, however, redefines this interaction by allowing the agent to generate and execute JavaScript code against a typed SDK and the API's OpenAPI specification. This code acts as a compact plan, enabling agents to dynamically explore API capabilities and compose multiple calls efficiently. The key architectural insight is to shift the complexity of API discovery and orchestration from static tool definitions to dynamic code execution.

Server-Side Execution for Efficiency and Security

ℹ️

Fixed Token Cost

Code Mode significantly reduces context window usage by consolidating thousands of API endpoints into just two core tools: search() and execute(). This results in a fixed, minimal token footprint (around 1,000 tokens for the entire Cloudflare API), regardless of the API's actual size or the addition of new endpoints. This provides immense scalability and cost savings for agent interactions.

The server-side implementation of Code Mode is crucial. The agent's generated JavaScript code for both searching the API spec and executing API calls is run within a secure, sandboxed Dynamic Worker isolate. This V8 sandbox is designed with no file system and external fetches disabled by default, mitigating security risks like prompt injection and ensuring controlled outbound requests. This centralized execution model also abstracts away the need for agents to understand the underlying API structure, promoting progressive capability discovery.

Architectural Components

search() Tool: Allows agents to query the Cloudflare OpenAPI spec using JavaScript. The full spec never enters the model's context; agents interact with it programmatically to filter endpoints by various criteria (product, path, tags) and inspect schemas.
execute() Tool: Enables agents to make authenticated Cloudflare API requests within the secure sandbox. Agents can chain operations, handle pagination, and manage responses through code, effectively orchestrating complex API workflows.
Dynamic Worker Isolate: A lightweight V8 sandbox that executes the agent-generated JavaScript code safely and efficiently. It provides a controlled environment with strict security boundaries.
Cloudflare API Client: An authenticated client (cloudflare.request()) made available within the sandbox for making actual API calls.

javascript

async () => {
  // Example of using search() to find WAF and ruleset endpoints
  const results = [];
  for (const [path, methods] of Object.entries(spec.paths)) {
    if (path.includes('/zones/') && (path.includes('firewall/waf') || path.includes('rulesets'))) {
      for (const [method, op] of Object.entries(methods)) {
        results.push({ method: method.toUpperCase(), path, summary: op.summary });
      }
    }
  }
  return results;
}

This architectural pattern provides significant advantages over other context reduction techniques, such as client-side code execution (which requires a secure sandbox on the agent side), CLI-based interactions (broader attack surface), or dynamic tool search (requires a maintained search function and still incurs token costs per matched tool). The server-side Code Mode combines fixed token costs, agent-agnostic implementation, progressive discovery, and secure execution.

AI agentscontext windowAPI managementOpenAPIserverlesssecurity sandboxtoken optimizationdistributed computing

Comments

Loading comments...

Architecture Design

Design this yourself

Design a system that enables AI agents to interact with a large, evolving enterprise API with minimal context window usage. The system should allow agents to discover API capabilities dynamically, compose complex operations, and execute code safely in a sandboxed environment, while maintaining a fixed token footprint in the agent's prompt.

Focus: API interaction and context management for AI agents using code execution in a secure sandbox

Other design angles

· Design a generic framework for wrapping any OpenAPI-compliant API with a Code Mode-like interface for AI agents.· Design the security architecture for the sandboxed execution environment, focusing on preventing prompt injection and controlling outbound network access.· Design a multi-tenant platform that uses Code Mode portals to allow AI agents to interact with multiple distinct MCP servers behind a unified gateway with shared authentication and access control.