OpenAI-Compatible API
Kaman exposes an OpenAI-compatible REST API so you can interact with your agents using the standard OpenAI SDK, curl, or any tool that speaks the OpenAI chat-completions protocol.
Base URL
/api/v1
For self-hosted installations: http://kaman.ai/api/v1
Authentication
All requests require a Bearer token in the Authorization header:
Authorization: Bearer <your-kaman-token>
Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/chat/completions | Chat completions (streaming & non-streaming) |
| GET | /api/v1/models | List available agents/models |
Models & Routing
The model field in your request controls which Kaman expert is used and how it runs.
| Model Format | Mode | Behavior |
|---|---|---|
42_0 | LLM | Routes through expert's underlying LLM with system prompt |
expert:42_0 | LLM | Same as above (explicit) |
agent:42_0 | Agent | Full Kaman agent pipeline — LangGraph, tools, RAG, memory |
LLM mode injects the expert's system prompt and forwards to Model Proxy. Fast, no tool calling from the Kaman side.
Agent mode triggers the full Kaman agent pipeline: thinking, tool fetching, tool execution, RAG, memory, and suggestions. Supports multi-turn conversations, human-in-the-loop interrupts, and artifact generation.
List Models
Returns all experts available to the authenticated user, each in two variants (LLM mode and Agent mode).
Request
curl http://kaman.ai/api/v1/models \
-H "Authorization: Bearer $KAMAN_TOKEN"
Response
{
"object": "list",
"data": [
{
"id": "42_0",
"object": "model",
"created": 1708881234,
"owned_by": "kaman",
"name": "Sales Assistant",
"underlying_model": "gpt-4",
"description": "Handles sales inquiries and CRM operations"
},
{
"id": "agent:42_0",
"object": "model",
"created": 1708881234,
"owned_by": "kaman",
"name": "Sales Assistant (Agent)",
"underlying_model": "gpt-4",
"description": "Full agent mode with tools and memory"
}
]
}
Chat Completions
Request
POST /api/v1/chat/completions
{
"model": "agent:42_0",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What were last quarter's sales?"}
],
"stream": false,
"temperature": 0.7,
"max_tokens": 1024,
"tools": [],
"tool_choice": "auto"
}
Request Fields
| Field | Type | Default | Description |
|---|---|---|---|
| model | string | required | Expert ID with optional prefix (see routing table) |
| messages | array | required | Chat history in OpenAI message format |
| stream | boolean | false | Enable Server-Sent Events streaming |
| temperature | number | 0.7 | Sampling temperature (0–2) |
| max_tokens | number | — | Maximum tokens in the response |
| top_p | number | — | Nucleus sampling |
| frequency_penalty | number | — | Frequency penalty (−2 to 2) |
| presence_penalty | number | — | Presence penalty (−2 to 2) |
| stop | string[] | — | Stop sequences |
| tools | array | — | OpenAI function definitions (LLM mode only) |
| tool_choice | string/object | "auto" | Tool choice strategy |
Non-Streaming Response
{
"id": "chatcmpl-abc123def456",
"object": "chat.completion",
"created": 1708881234,
"model": "agent:42_0",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Last quarter's total sales were $2.4M, up 12% from Q2."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}
Streaming Response
When stream: true, the response is a stream of Server-Sent Events:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708881234,"model":"agent:42_0","choices":[{"index":0,"delta":{"content":"Last "},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708881234,"model":"agent:42_0","choices":[{"index":0,"delta":{"content":"quarter's "},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1708881234,"model":"agent:42_0","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Tool Calling
In Agent mode, the Kaman agent handles tool calling internally — it discovers, selects, and executes tools autonomously. Tool call events are streamed back in OpenAI format so you can observe them.
In LLM mode, you can pass OpenAI-format tools and tool_choice to have the underlying LLM generate tool calls, just like the standard OpenAI API.
Tool Call in Response
{
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "getQuarterlySales",
"arguments": "{\"quarter\": \"Q3\", \"year\": 2024}"
}
}
]
},
"finish_reason": "tool_calls"
}
]
}
Kaman Extensions
Responses may include additional fields that standard OpenAI SDKs safely ignore. Kaman-aware clients can use these for richer UX.
| Field | Type | Description |
|---|---|---|
kaman_artifacts | array | Files or content artifacts generated by the agent |
kaman_interrupt | array | Human-in-the-loop interrupts requiring user input |
kaman_suggestions | array | Suggested follow-up prompts |
kaman_thought | string | Agent's internal reasoning (extended thinking) |
Artifact Example
{
"kaman_artifacts": [
{
"id": "artifact_1a2b3c",
"name": "quarterly_report.xlsx",
"type": "file",
"url": "/api/artifacts/artifact_1a2b3c"
}
]
}
Interrupt Example (Human-in-the-Loop)
When the agent needs user confirmation or input:
{
"kaman_interrupt": [
{
"type": "confirmation",
"message": "Send the report to finance@company.com?",
"options": [
{"value": "yes", "label": "Yes, send it"},
{"value": "no", "label": "Cancel"}
],
"toolName": "sendEmail",
"toolCallId": "call_xyz789",
"resumable": true
}
]
}
Error Handling
Errors follow the OpenAI error format:
{
"error": {
"message": "Invalid authentication token",
"type": "authentication_error",
"code": "invalid_api_key"
}
}
| HTTP Status | Error Type | Description |
|---|---|---|
| 401 | authentication_error | Invalid or missing token |
| 400 | invalid_request_error | Malformed request body |
| 404 | not_found_error | Expert/model not found |
| 500 | api_error | Internal server error |
Code Examples
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="http://kaman.ai/api/v1",
api_key="your-kaman-token",
)
# Non-streaming
response = client.chat.completions.create(
model="agent:42_0",
messages=[
{"role": "user", "content": "Summarize last month's revenue"}
],
)
print(response.choices[0].message.content)
# Streaming
stream = client.chat.completions.create(
model="agent:42_0",
messages=[
{"role": "user", "content": "Summarize last month's revenue"}
],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
TypeScript (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://kaman.ai/api/v1",
apiKey: "your-kaman-token",
});
// Non-streaming
const response = await client.chat.completions.create({
model: "agent:42_0",
messages: [{ role: "user", content: "What are today's open tickets?" }],
});
console.log(response.choices[0].message.content);
// Streaming
const stream = await client.chat.completions.create({
model: "agent:42_0",
messages: [{ role: "user", content: "What are today's open tickets?" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
cURL
# Non-streaming
curl -X POST http://kaman.ai/api/v1/chat/completions \
-H "Authorization: Bearer $KAMAN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "agent:42_0",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Streaming
curl -N -X POST http://kaman.ai/api/v1/chat/completions \
-H "Authorization: Bearer $KAMAN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "agent:42_0",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'
LLM Mode (Direct Expert)
# Use the expert's LLM directly (no agent pipeline)
response = client.chat.completions.create(
model="42_0", # or "expert:42_0"
messages=[
{"role": "user", "content": "Translate this to French: Hello world"}
],
temperature=0.3,
)
Multi-Turn Conversations
The API maintains session state in Agent mode. Pass the full conversation history:
messages = [
{"role": "user", "content": "Find all overdue invoices"},
{"role": "assistant", "content": "I found 12 overdue invoices totaling $45,000."},
{"role": "user", "content": "Send reminders to the top 5 by amount"},
]
response = client.chat.completions.create(
model="agent:42_0",
messages=messages,
)
Timeouts & Limits
| Setting | Value |
|---|---|
| Request timeout | 5 minutes |
| Max duration (edge function) | 300 seconds |
| CORS | Open (*) |
Agent mode requests may take longer due to tool execution. Use streaming for real-time progress.
Next Steps
- A2A Protocol — Agent-to-Agent communication protocol
- Authentication — API authentication guide
- Tools API — Search and execute individual tools