API Reference
OpenRouter’s request and response schemas are very similar to the OpenAI Chat API, with a few small differences. At a high level, OpenRouter normalizes the schema across models and providers so you only need to learn one.
OpenAPI Specification
The complete OpenRouter API is documented using the OpenAPI specification. You can access the specification in either YAML or JSON format:
These specifications can be used with tools like Swagger UI, Postman, or any OpenAPI-compatible code generator to explore the API or generate client libraries.
Requests
Completions Request Format
Here is the request schema as a TypeScript type. This will be the body of your POST request to the /api/v1/chat/completions endpoint (see the quick start above for an example).
For a complete list of parameters, see the Parameters.
Structured Outputs
The response_format parameter allows you to enforce structured JSON responses from the model. OpenRouter supports two modes:
{ type: 'json_object' }: Basic JSON mode - the model will return valid JSON{ type: 'json_schema', json_schema: { ... } }: Strict schema mode - the model will return JSON matching your exact schema
For detailed usage and examples, see Structured Outputs. To find models that support structured outputs, check the models page.
Plugins
OpenRouter plugins extend model capabilities with features like web search, PDF processing, and response healing. Enable plugins by adding a plugins array to your request:
Available plugins include web (real-time web search), file-parser (PDF processing), and response-healing (automatic JSON repair). For detailed configuration options, see Plugins
Headers
OpenRouter allows you to specify some optional headers to identify your app and make it discoverable to users on our site.
HTTP-Referer: Identifies your app on openrouter.aiX-Title: Sets/modifies your app’s title
Model routing
If the model parameter is omitted, the user or payer’s default is used.
Otherwise, remember to select a value for model from the supported
models or API, and include the organization
prefix. OpenRouter will select the least expensive and best GPUs available to
serve the request, and fall back to other providers or GPUs if it receives a
5xx response code or if you are rate-limited.
Streaming
Server-Sent Events
(SSE)
are supported as well, to enable streaming for all models. Simply send
stream: true in your request body. The SSE stream will occasionally contain
a “comment” payload, which you should ignore (noted below).
Non-standard parameters
If the chosen model doesn’t support a request parameter (such as logit_bias
in non-OpenAI models, or top_k for OpenAI), then the parameter is ignored.
The rest are forwarded to the underlying model API.
Assistant Prefill
OpenRouter supports asking models to complete a partial response. This can be useful for guiding models to respond in a certain way.
To use this features, simply include a message with role: "assistant" at the end of your messages array.
Responses
CompletionsResponse Format
OpenRouter normalizes the schema across models and providers to comply with the OpenAI Chat API.
This means that choices is always an array, even if the model only returns one completion. Each choice will contain a delta property if a stream was requested and a message property otherwise. This makes it easier to use the same code for all models.
Here’s the response schema as a TypeScript type:
Here’s an example:
Finish Reason
OpenRouter normalizes each model’s finish_reason to one of the following values: tool_calls, stop, length, content_filter, error.
Some models and providers may have additional finish reasons. The raw finish_reason string returned by the model is available via the native_finish_reason property.
Querying Cost and Stats
The token counts returned in the completions API response are calculated using the model’s native tokenizer. Credit usage and model pricing are based on these native token counts.
You can also use the returned id to query for the generation stats (including token counts and cost) after the request is complete via the /api/v1/generation endpoint. This is useful for auditing historical usage or when you need to fetch stats asynchronously.
Please see the Generation API reference for the full response shape.
Note that token counts are also available in the usage field of the response body for non-streaming completions.