OpenAI-Compatible API

Kaman exposes an OpenAI-compatible REST API so you can interact with your agents using the standard OpenAI SDK, curl, or any tool that speaks the OpenAI chat-completions protocol.

Base URL

/api/v1

For self-hosted installations: http://kaman.ai/api/v1

Authentication

All requests require a Bearer token in the Authorization header:

Authorization: Bearer <your-kaman-token>

Endpoints

Method	Path	Description
POST	`/api/v1/chat/completions`	Chat completions (streaming & non-streaming)
GET	`/api/v1/models`	List available agents/models

Models & Routing

The model field in your request controls which Kaman expert is used and how it runs.

Model Format	Mode	Behavior
`42_0`	LLM	Routes through expert's underlying LLM with system prompt
`expert:42_0`	LLM	Same as above (explicit)
`agent:42_0`	Agent	Full Kaman agent pipeline — LangGraph, tools, RAG, memory

LLM mode injects the expert's system prompt and forwards to Model Proxy. Fast, no tool calling from the Kaman side.

Agent mode triggers the full Kaman agent pipeline: thinking, tool fetching, tool execution, RAG, memory, and suggestions. Supports multi-turn conversations, human-in-the-loop interrupts, and artifact generation.

List Models

Returns all experts available to the authenticated user, each in two variants (LLM mode and Agent mode).

Request

bash

curl http://kaman.ai/api/v1/models \
  -H "Authorization: Bearer $KAMAN_TOKEN"

Response

json

{
  "object": "list",
  "data": [
    {
      "id": "42_0",
      "object": "model",
      "created": 1708881234,
      "owned_by": "kaman",
      "name": "Sales Assistant",
      "underlying_model": "gpt-4",
      "description": "Handles sales inquiries and CRM operations"
    },
    {
      "id": "agent:42_0",
      "object": "model",
      "created": 1708881234,
      "owned_by": "kaman",
      "name": "Sales Assistant (Agent)",
      "underlying_model": "gpt-4",
      "description": "Full agent mode with tools and memory"
    }
  ]
}

Chat Completions

Request

POST /api/v1/chat/completions

json

{
  "model": "agent:42_0",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What were last quarter's sales?"}
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 1024,
  "tools": [],
  "tool_choice": "auto"
}

Request Fields

Field	Type	Default	Description
model	string	required	Expert ID with optional prefix (see routing table)
messages	array	required	Chat history in OpenAI message format
stream	boolean	false	Enable Server-Sent Events streaming
temperature	number	0.7	Sampling temperature (0–2)
max_tokens	number	—	Maximum tokens in the response
top_p	number	—	Nucleus sampling
frequency_penalty	number	—	Frequency penalty (−2 to 2)
presence_penalty	number	—	Presence penalty (−2 to 2)
stop	string[]	—	Stop sequences
tools	array	—	OpenAI function definitions (LLM mode only)
tool_choice	string/object	"auto"	Tool choice strategy

Non-Streaming Response

json

{
  "id": "chatcmpl-abc123def456",
  "object": "chat.completion",
  "created": 1708881234,
  "model": "agent:42_0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Last quarter's total sales were $2.4M, up 12% from Q2."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

Streaming Response

When stream: true, the response is a stream of Server-Sent Events:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708881234,"model":"agent:42_0","choices":[{"index":0,"delta":{"content":"Last "},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708881234,"model":"agent:42_0","choices":[{"index":0,"delta":{"content":"quarter's "},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708881234,"model":"agent:42_0","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Tool Calling

In Agent mode, the Kaman agent handles tool calling internally — it discovers, selects, and executes tools autonomously. Tool call events are streamed back in OpenAI format so you can observe them.

In LLM mode, you can pass OpenAI-format tools and tool_choice to have the underlying LLM generate tool calls, just like the standard OpenAI API.

Tool Call in Response

json

{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "getQuarterlySales",
              "arguments": "{\"quarter\": \"Q3\", \"year\": 2024}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Kaman Extensions

Responses may include additional fields that standard OpenAI SDKs safely ignore. Kaman-aware clients can use these for richer UX.

Field	Type	Description
`kaman_artifacts`	array	Files or content artifacts generated by the agent
`kaman_interrupt`	array	Human-in-the-loop interrupts requiring user input
`kaman_suggestions`	array	Suggested follow-up prompts
`kaman_thought`	string	Agent's internal reasoning (extended thinking)

Artifact Example

json

{
  "kaman_artifacts": [
    {
      "id": "artifact_1a2b3c",
      "name": "quarterly_report.xlsx",
      "type": "file",
      "url": "/api/artifacts/artifact_1a2b3c"
    }
  ]
}

Interrupt Example (Human-in-the-Loop)

When the agent needs user confirmation or input:

json

{
  "kaman_interrupt": [
    {
      "type": "confirmation",
      "message": "Send the report to finance@company.com?",
      "options": [
        {"value": "yes", "label": "Yes, send it"},
        {"value": "no", "label": "Cancel"}
      ],
      "toolName": "sendEmail",
      "toolCallId": "call_xyz789",
      "resumable": true
    }
  ]
}

Error Handling

Errors follow the OpenAI error format:

json

{
  "error": {
    "message": "Invalid authentication token",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

HTTP Status	Error Type	Description
401	authentication_error	Invalid or missing token
400	invalid_request_error	Malformed request body
404	not_found_error	Expert/model not found
500	api_error	Internal server error

Code Examples

Python (OpenAI SDK)

python

from openai import OpenAI

client = OpenAI(
    base_url="http://kaman.ai/api/v1",
    api_key="your-kaman-token",
)

# Non-streaming
response = client.chat.completions.create(
    model="agent:42_0",
    messages=[
        {"role": "user", "content": "Summarize last month's revenue"}
    ],
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="agent:42_0",
    messages=[
        {"role": "user", "content": "Summarize last month's revenue"}
    ],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

TypeScript (OpenAI SDK)

typescript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://kaman.ai/api/v1",
  apiKey: "your-kaman-token",
});

// Non-streaming
const response = await client.chat.completions.create({
  model: "agent:42_0",
  messages: [{ role: "user", content: "What are today's open tickets?" }],
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
  model: "agent:42_0",
  messages: [{ role: "user", content: "What are today's open tickets?" }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

cURL

bash

# Non-streaming
curl -X POST http://kaman.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $KAMAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "agent:42_0",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Streaming
curl -N -X POST http://kaman.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $KAMAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "agent:42_0",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

LLM Mode (Direct Expert)

python

# Use the expert's LLM directly (no agent pipeline)
response = client.chat.completions.create(
    model="42_0",  # or "expert:42_0"
    messages=[
        {"role": "user", "content": "Translate this to French: Hello world"}
    ],
    temperature=0.3,
)

Multi-Turn Conversations

The API maintains session state in Agent mode. Pass the full conversation history:

python

messages = [
    {"role": "user", "content": "Find all overdue invoices"},
    {"role": "assistant", "content": "I found 12 overdue invoices totaling $45,000."},
    {"role": "user", "content": "Send reminders to the top 5 by amount"},
]

response = client.chat.completions.create(
    model="agent:42_0",
    messages=messages,
)

Timeouts & Limits

Setting	Value
Request timeout	5 minutes
Max duration (edge function)	300 seconds
CORS	Open (`*`)

Agent mode requests may take longer due to tool execution. Use streaming for real-time progress.

Next Steps

A2A Protocol — Agent-to-Agent communication protocol
Authentication — API authentication guide
Tools API — Search and execute individual tools

OpenAI-Compatible API

Kaman exposes an OpenAI-compatible REST API so you can interact with your agents using the standard OpenAI SDK, curl, or any tool that speaks the OpenAI chat-completions protocol.

Base URL

/api/v1

For self-hosted installations: http://kaman.ai/api/v1

Authentication

All requests require a Bearer token in the Authorization header:

Authorization: Bearer <your-kaman-token>

Endpoints

Method	Path	Description
POST	`/api/v1/chat/completions`	Chat completions (streaming & non-streaming)
GET	`/api/v1/models`	List available agents/models

Models & Routing

The model field in your request controls which Kaman expert is used and how it runs.

Model Format	Mode	Behavior
`42_0`	LLM	Routes through expert's underlying LLM with system prompt
`expert:42_0`	LLM	Same as above (explicit)
`agent:42_0`	Agent	Full Kaman agent pipeline — LangGraph, tools, RAG, memory

LLM mode injects the expert's system prompt and forwards to Model Proxy. Fast, no tool calling from the Kaman side.

List Models

Returns all experts available to the authenticated user, each in two variants (LLM mode and Agent mode).

Request

bash

curl http://kaman.ai/api/v1/models \
  -H "Authorization: Bearer $KAMAN_TOKEN"

Response

json

{
  "object": "list",
  "data": [
    {
      "id": "42_0",
      "object": "model",
      "created": 1708881234,
      "owned_by": "kaman",
      "name": "Sales Assistant",
      "underlying_model": "gpt-4",
      "description": "Handles sales inquiries and CRM operations"
    },
    {
      "id": "agent:42_0",
      "object": "model",
      "created": 1708881234,
      "owned_by": "kaman",
      "name": "Sales Assistant (Agent)",
      "underlying_model": "gpt-4",
      "description": "Full agent mode with tools and memory"
    }
  ]
}

Chat Completions

Request

POST /api/v1/chat/completions

json

{
  "model": "agent:42_0",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What were last quarter's sales?"}
  ],
  "stream": false,
  "temperature": 0.7,
  "max_tokens": 1024,
  "tools": [],
  "tool_choice": "auto"
}

Request Fields

Field	Type	Default	Description
model	string	required	Expert ID with optional prefix (see routing table)
messages	array	required	Chat history in OpenAI message format
stream	boolean	false	Enable Server-Sent Events streaming
temperature	number	0.7	Sampling temperature (0–2)
max_tokens	number	—	Maximum tokens in the response
top_p	number	—	Nucleus sampling
frequency_penalty	number	—	Frequency penalty (−2 to 2)
presence_penalty	number	—	Presence penalty (−2 to 2)
stop	string[]	—	Stop sequences
tools	array	—	OpenAI function definitions (LLM mode only)
tool_choice	string/object	"auto"	Tool choice strategy

Non-Streaming Response

json

{
  "id": "chatcmpl-abc123def456",
  "object": "chat.completion",
  "created": 1708881234,
  "model": "agent:42_0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Last quarter's total sales were $2.4M, up 12% from Q2."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

Streaming Response

When stream: true, the response is a stream of Server-Sent Events:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708881234,"model":"agent:42_0","choices":[{"index":0,"delta":{"content":"Last "},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708881234,"model":"agent:42_0","choices":[{"index":0,"delta":{"content":"quarter's "},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708881234,"model":"agent:42_0","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Tool Calling

In LLM mode, you can pass OpenAI-format tools and tool_choice to have the underlying LLM generate tool calls, just like the standard OpenAI API.

Tool Call in Response

json

{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "getQuarterlySales",
              "arguments": "{\"quarter\": \"Q3\", \"year\": 2024}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Kaman Extensions

Responses may include additional fields that standard OpenAI SDKs safely ignore. Kaman-aware clients can use these for richer UX.

Field	Type	Description
`kaman_artifacts`	array	Files or content artifacts generated by the agent
`kaman_interrupt`	array	Human-in-the-loop interrupts requiring user input
`kaman_suggestions`	array	Suggested follow-up prompts
`kaman_thought`	string	Agent's internal reasoning (extended thinking)

Artifact Example

json

{
  "kaman_artifacts": [
    {
      "id": "artifact_1a2b3c",
      "name": "quarterly_report.xlsx",
      "type": "file",
      "url": "/api/artifacts/artifact_1a2b3c"
    }
  ]
}

Interrupt Example (Human-in-the-Loop)

When the agent needs user confirmation or input:

json

{
  "kaman_interrupt": [
    {
      "type": "confirmation",
      "message": "Send the report to finance@company.com?",
      "options": [
        {"value": "yes", "label": "Yes, send it"},
        {"value": "no", "label": "Cancel"}
      ],
      "toolName": "sendEmail",
      "toolCallId": "call_xyz789",
      "resumable": true
    }
  ]
}

Error Handling

Errors follow the OpenAI error format:

json

{
  "error": {
    "message": "Invalid authentication token",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

HTTP Status	Error Type	Description
401	authentication_error	Invalid or missing token
400	invalid_request_error	Malformed request body
404	not_found_error	Expert/model not found
500	api_error	Internal server error

Code Examples

Python (OpenAI SDK)

python

from openai import OpenAI

client = OpenAI(
    base_url="http://kaman.ai/api/v1",
    api_key="your-kaman-token",
)

# Non-streaming
response = client.chat.completions.create(
    model="agent:42_0",
    messages=[
        {"role": "user", "content": "Summarize last month's revenue"}
    ],
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="agent:42_0",
    messages=[
        {"role": "user", "content": "Summarize last month's revenue"}
    ],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

TypeScript (OpenAI SDK)

typescript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://kaman.ai/api/v1",
  apiKey: "your-kaman-token",
});

// Non-streaming
const response = await client.chat.completions.create({
  model: "agent:42_0",
  messages: [{ role: "user", content: "What are today's open tickets?" }],
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
  model: "agent:42_0",
  messages: [{ role: "user", content: "What are today's open tickets?" }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

cURL

bash

# Non-streaming
curl -X POST http://kaman.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $KAMAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "agent:42_0",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Streaming
curl -N -X POST http://kaman.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $KAMAN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "agent:42_0",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

LLM Mode (Direct Expert)

python

# Use the expert's LLM directly (no agent pipeline)
response = client.chat.completions.create(
    model="42_0",  # or "expert:42_0"
    messages=[
        {"role": "user", "content": "Translate this to French: Hello world"}
    ],
    temperature=0.3,
)

Multi-Turn Conversations

The API maintains session state in Agent mode. Pass the full conversation history:

python

messages = [
    {"role": "user", "content": "Find all overdue invoices"},
    {"role": "assistant", "content": "I found 12 overdue invoices totaling $45,000."},
    {"role": "user", "content": "Send reminders to the top 5 by amount"},
]

response = client.chat.completions.create(
    model="agent:42_0",
    messages=messages,
)

Timeouts & Limits

Setting	Value
Request timeout	5 minutes
Max duration (edge function)	300 seconds
CORS	Open (`*`)

Agent mode requests may take longer due to tool execution. Use streaming for real-time progress.

Next Steps

A2A Protocol — Agent-to-Agent communication protocol
Authentication — API authentication guide
Tools API — Search and execute individual tools