Endpoints / OpenAI-compatible

Responses API

POST/v1/responses

Use Responses for reasoning models, multimodal inputs and Lazu file_id dereferencing. This is the recommended endpoint when a request needs uploaded files or newer OpenAI response features.

When to use it

Use Responses

Uploaded files, reasoning controls, document workflows, and multimodal inputs that should be normalized server-side.

Use Chat completions

Simple chat, SDK compatibility, tool calling, and existing apps already built around /v1/chat/completions.

Request body

modelstring

required

Model ID from /api/models/catalog. Prefer models where supported_endpoint_types includes responses.

inputstring | object[]

required

Text input or an array of response input messages.

instructionsstring

nullable

System-level instructions for the response.

reasoningobject

nullable

Reasoning effort controls for supported models, for example {"effort":"medium"}.

toolsobject[]

nullable

Tool definitions using the OpenAI-compatible Responses shape.

streamboolean

nullable

Streams response events when supported by the selected model.

max_output_tokensinteger

nullable

Upper bound for generated output tokens.

Content parts

input_textcontent part

Text content sent to the model.

input_imagecontent part

Image content. Lazu can dereference uploaded file_id values with purpose vision.

input_filecontent part

Uploaded file reference. Lazu dereferences file content server-side before forwarding the request upstream.

File dereferencing

Upload via Files, then reference the resulting

file_id. Lazu adds X-Lazu-File-Dereference: 1 when the request dereferenced files.

Limits:

Single file purpose limit still applies.
Total dereferenced files in one Responses call must stay under 64 MB.
Chat completions does not auto-dereference file_id.

Response

idstring

Response ID.

outputobject[]

Output messages, reasoning items, tool calls, or other response events.

usage.input_tokensinteger

nullable

Input token count when the upstream reports usage.

usage.output_tokensinteger

nullable

Output token count when the upstream reports usage.

usage.input_tokens_details.cached_tokensinteger

nullable

Cache read tokens for providers that expose response-level cache usage.

For full reconciliation, use

GET /api/usage/requests/{request_id} with the same API key.

Responses API

When to use it

Use Responses

Use Chat completions

Request body

Content parts

File dereferencing

Response

See also