Endpoints / OpenAI-compatible

Chat completions

POST/v1/chat/completions

Send OpenAI-compatible chat requests through Lazu. Use this page as the working reference: base URL, auth, required body fields, examples, response usage, streaming and tool calling all live here.

Basic configuration

Base URL

Use https://api.lazu.ai/v1 with OpenAI SDKs, or call the full path https://api.lazu.ai/v1/chat/completions directly.

Model discovery

Read GET /api/models/catalog at runtime. Filter entries where supported_endpoint_types contains chat.

Request body

modelstring

required

Model ID, route alias or future route policy name. For explicit models, pass IDs from GET /api/models/catalog. For SDK compatibility, /v1/models remains available as a flat list.

messagesobject[]

required

Ordered conversation messages. Roles follow the OpenAI shape: system, user, assistant, and tool.

rolestring

One of system, user, assistant or tool.

contentstring | content_part[]

Plain text, or an array of content parts for multimodal requests.

tool_call_idstring

Required on tool messages so the model can associate the result with the earlier call.

streamboolean

nullable

When true, Lazu forwards a Server-Sent Events stream and ends with data: [DONE].

toolsobject[]

nullable

Function/tool definitions. Check parameters.tools in the model catalog before sending tools to a model.

type"function"

Only function tools are supported today.

function.namestring

Tool name the model will call.

function.parametersobject

JSON Schema describing the tool arguments.

tool_choicestring | object

nullable

OpenAI-compatible tool choice control. Use auto, none, required, or a named tool object when the selected model supports it.

response_formatobject

nullable

Structured output control. Use {"type":"json_object"} or a JSON schema object when the model supports strict structured output.

temperaturenumber

nullable

Sampling temperature. Most models accept 0 to 2, but provider-specific limits can differ.

max_tokensinteger

nullable

Maximum generated tokens. The final cap is still bounded by the selected model's context and output limits.

stream_optionsobject

nullable

Use {"include_usage":true} when the upstream supports streaming usage trailers.

Message content

rolestring

required

systemuserassistanttool

Message role.

contentstring | content_part[]

required

Plain text for text-only turns, or an array of content parts for multimodal requests.

tool_callsobject[]

nullable

Assistant tool calls returned by the model.

tool_call_idstring

nullable

Required on tool messages so the model can associate a tool result with the earlier tool call.

Vision input

For image input, send OpenAI-compatible content parts. Use either HTTPS image URLs or data URLs:

Image input

json

{
"model": "gpt-4o",
"messages": [
  {
    "role": "user",
    "content": [
      {"type": "text", "text": "What's in this image?"},
      {
        "type": "image_url",
        "image_url": {"url": "data:image/png;base64,..."}
      }
    ]
  }
]
}

For PDFs and large documents, upload through Files and use Responses. Chat completions does not automatically dereference file_id.

Tools

Tool definition

json

{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Weather in Tokyo?"}],
"tools": [
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {
        "type": "object",
        "properties": {
          "city": {"type": "string"}
        },
        "required": ["city"]
      }
    }
  }
]
}

Tool support is not universal. Prefer /api/models/catalog and look at parameters.tools before routing agent traffic.

Response

idstring

Lazu or provider response ID. Use the response header X-Lazu-Request-Id for request-level reconciliation.

choicesobject[]

Assistant output choices. Streaming responses send incremental delta objects.

usage.prompt_tokensinteger

Prompt input tokens.

usage.completion_tokensinteger

Generated output tokens.

usage.prompt_tokens_details.cached_tokensinteger

nullable

Cache read tokens when the upstream reports them.

usage.prompt_tokens_details.cache_write_tokensinteger

nullable

Cache creation/write tokens when the upstream reports them.

usage.prompt_tokens_details.cache_miss_tokensinteger

nullable

Cache misses when the provider reports them separately, for example DeepSeek-compatible usage.

For complete usage, billing line items, provider raw usage fields and routing metadata, call:

Request detail

bash

curl https://api.lazu.ai/api/usage/requests/req_lazu_01ABCDEF \
-H "Authorization: Bearer $LAZU_API_KEY"

Errors

Relay errors use OpenAI-compatible error envelopes where possible and include a request ID. See Errors for code meanings and retry behavior.

Chat completions

Basic configuration

Base URL

Model discovery

Request body

Message content

Vision input

Tools

Response

Errors

See also