DocsApi
OpenAI-Compatible API
Run local OpenAI-compatible API endpoints on your Server Node for inference.
Overview
Server Nodes can expose OpenAI-compatible REST APIs for:
- Chat Completions: Conversational AI with trained models
- Text Completions: Text generation tasks
- Embeddings: Vector representations of text
- Model Listing: Available models on the node
Enabling the API
# config.toml [api] enabled = true address = "127.0.0.1:8080" # For external access: # address = "0.0.0.0:8080" # Authentication api_key = "sk-your-secret-key" require_auth = true # Rate limiting max_requests_per_minute = 60 max_tokens_per_minute = 100000 # CORS (for web clients) cors_enabled = true cors_origins = ["http://localhost:3000"]
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /v1/models | List available models |
| GET | /v1/models/:id | Get model details |
| POST | /v1/chat/completions | Chat completion |
| POST | /v1/completions | Text completion |
| POST | /v1/embeddings | Generate embeddings |
Chat Completions
# Request
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-secret-key" \
-d '{
"model": "my-fine-tuned-model",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 150
}'
# Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1702345678,
"model": "my-fine-tuned-model",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 9,
"total_tokens": 29
}
}Streaming Responses
# Enable streaming with stream: true
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-secret-key" \
-d '{
"model": "my-fine-tuned-model",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'
# Response (Server-Sent Events)
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Once"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" upon"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" a"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" time"}}]}
data: [DONE]Embeddings
# Request
curl http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-secret-key" \
-d '{
"model": "text-embedding-model",
"input": "The quick brown fox jumps over the lazy dog"
}'
# Response
{
"object": "list",
"data": [{
"object": "embedding",
"index": 0,
"embedding": [0.0023, -0.0094, 0.0152, ...]
}],
"model": "text-embedding-model",
"usage": {
"prompt_tokens": 9,
"total_tokens": 9
}
}SDK Integration
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="sk-your-secret-key"
)
response = client.chat.completions.create(
model="my-fine-tuned-model",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)JavaScript/TypeScript
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:8080/v1',
apiKey: 'sk-your-secret-key',
});
const response = await client.chat.completions
.create({
model: 'my-fine-tuned-model',
messages: [
{ role: 'user', content: 'Hello!' }
],
});
console.log(response.choices[0].message);Model Management
# config.toml - Register models for API [api.models] [api.models.my-chat-model] path = "/models/chat-model-v1" type = "chat" context_length = 4096 default_temperature = 0.7 [api.models.my-embedding-model] path = "/models/embedding-model-v1" type = "embedding" dimensions = 768 [api.models.my-completion-model] path = "/models/completion-model-v1" type = "completion" context_length = 2048
Request Parameters
| Parameter | Type | Description |
|---|---|---|
| model | string | Model ID to use |
| messages | array | Conversation messages |
| temperature | float | Randomness (0-2) |
| max_tokens | int | Max response length |
| top_p | float | Nucleus sampling |
| stream | bool | Enable streaming |
| stop | array | Stop sequences |