OpenAI-Compatible API

Run local OpenAI-compatible API endpoints on your Server Node for inference.

Overview

Server Nodes can expose OpenAI-compatible REST APIs for:

Chat Completions: Conversational AI with trained models
Text Completions: Text generation tasks
Embeddings: Vector representations of text
Model Listing: Available models on the node

Enabling the API

# config.toml
[api]
enabled = true
address = "127.0.0.1:8080"
# For external access:
# address = "0.0.0.0:8080"

# Authentication
api_key = "sk-your-secret-key"
require_auth = true

# Rate limiting
max_requests_per_minute = 60
max_tokens_per_minute = 100000

# CORS (for web clients)
cors_enabled = true
cors_origins = ["http://localhost:3000"]

API Endpoints

Method	Endpoint	Description
GET	/v1/models	List available models
GET	/v1/models/:id	Get model details
POST	/v1/chat/completions	Chat completion
POST	/v1/completions	Text completion
POST	/v1/embeddings	Generate embeddings

Chat Completions

# Request
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-secret-key" \
  -d '{
    "model": "my-fine-tuned-model",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'

# Response
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1702345678,
  "model": "my-fine-tuned-model",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 9,
    "total_tokens": 29
  }
}

Streaming Responses

# Enable streaming with stream: true
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-secret-key" \
  -d '{
    "model": "my-fine-tuned-model",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

# Response (Server-Sent Events)
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Once"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" upon"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" a"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" time"}}]}
data: [DONE]

Embeddings

# Request
curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-secret-key" \
  -d '{
    "model": "text-embedding-model",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

# Response
{
  "object": "list",
  "data": [{
    "object": "embedding",
    "index": 0,
    "embedding": [0.0023, -0.0094, 0.0152, ...]
  }],
  "model": "text-embedding-model",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

SDK Integration

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="sk-your-secret-key"
)

response = client.chat.completions.create(
    model="my-fine-tuned-model",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

JavaScript/TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:8080/v1',
  apiKey: 'sk-your-secret-key',
});

const response = await client.chat.completions
  .create({
    model: 'my-fine-tuned-model',
    messages: [
      { role: 'user', content: 'Hello!' }
    ],
  });

console.log(response.choices[0].message);

Model Management

# config.toml - Register models for API
[api.models]

[api.models.my-chat-model]
path = "/models/chat-model-v1"
type = "chat"
context_length = 4096
default_temperature = 0.7

[api.models.my-embedding-model]
path = "/models/embedding-model-v1"
type = "embedding"
dimensions = 768

[api.models.my-completion-model]
path = "/models/completion-model-v1"
type = "completion"
context_length = 2048

Request Parameters

Parameter	Type	Description
model	string	Model ID to use
messages	array	Conversation messages
temperature	float	Randomness (0-2)
max_tokens	int	Max response length
top_p	float	Nucleus sampling
stream	bool	Enable streaming
stop	array	Stop sequences