OpenAI API Compatible

LLMBoost provides full compatibility with OpenAI's API, allowing you to migrate existing applications with zero code changes. Simply point your OpenAI client to LLMBoost's endpoint and enjoy superior performance without rewriting your application.

Why This Matters

Seamless Migration - Drop-in replacement for OpenAI API
Existing Tools Work - Use any OpenAI-compatible library or tool
Cost Savings - Run models on your own infrastructure
Better Performance - Faster inference with LLMBoost optimizations

Supported Endpoints

LLMBoost supports the following OpenAI-compatible endpoints:

Endpoint	Description	Status
`/v1/chat/completions`	Chat-style conversations	Fully Supported
`/v1/completions`	Single, continuous prompt	Fully Supported

Both endpoints support:

Synchronous and asynchronous requests
Streaming and non-streaming responses
All standard OpenAI APIs

Quick Start

Start the Server

Using LLMBoost Hub
Manual Setup

# Deploy a model
lbh serve meta-llama/Llama-3.1-8B-Instruct --port 8011

The server will be available at http://localhost:8011 by default.

The below command should be executed inside the LLMBoost Docker container, which you can run using the manual docker setup.

# Inside LLMBoost container
llmboost serve --model_name meta-llama/Llama-3.1-8B-Instruct

The server will be available at http://localhost:8011 by default.

Usage Examples

Chat Completions

The chat completions endpoint is ideal for conversational AI applications.

Default port

By default, LLMBoost listens on port 8011. If you have configured a different port, please adjust the base_url accordingly.

curl
Python (Synchronous)
Python (Async)

curl http://localhost:8011/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is deep learning?"
      }
    ],
    "max_tokens": 512,
    "temperature": 0.7
  }'

To use the OpenAI-compatible API in Python, install the openai package:

pip install openai

Then import the library and point it to your LLMBoost server as follows:

from openai import OpenAI

# Point to LLMBoost server
client = OpenAI(
    base_url="http://localhost:8011/v1",
    api_key="-"  # Dummy key (not validated)
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is deep learning?"}
    ],
    max_tokens=512,
    temperature=0.7
)

print(response.choices[0].message.content)

To use the OpenAI-compatible API in Python, install the openai package:

pip install openai

Then import the library and point it to your LLMBoost server as follows:

import asyncio
from openai import AsyncOpenAI

# Point to LLMBoost server
client = AsyncOpenAI(
    base_url="http://localhost:8011/v1",
    api_key="-"
)

async def main():
    response = await client.chat.completions.create(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is deep learning?"}
        ],
        max_tokens=512,
        temperature=0.7
    )
    
    print(response.choices[0].message.content)

asyncio.run(main())

Completions

The completions endpoint is ideal for text continuation tasks.

curl
Python (Synchronous)

curl http://localhost:8011/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "prompt": "Once upon a time",
    "max_tokens": 128,
    "temperature": 0.8
  }'

To use the OpenAI-compatible API in Python, install the openai package:

pip install openai

Then import the library and point it to your LLMBoost server as follows:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8011/v1",
    api_key="-"
)

response = client.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    prompt="Once upon a time",
    max_tokens=128,
    temperature=0.8
)

print(response.choices[0].text)

Response Format

Responses follow the standard OpenAI format:

Chat Completion Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1699999999,
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Deep learning is a subset of machine learning..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 50,
    "total_tokens": 70
  }
}

Completion Response

{
  "id": "cmpl-123",
  "object": "text_completion",
  "created": 1699999999,
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "choices": [
    {
      "text": "in a land far away, there lived a brave knight...",
      "index": 0,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 4,
    "completion_tokens": 50,
    "total_tokens": 54
  }
}

Migrating from OpenAI

Migrating from OpenAI to LLMBoost requires only two changes:

Change the base URL to point to your LLMBoost server
Update the model name to your deployed model

# Before (OpenAI)
client = OpenAI(
    api_key="sk-..."
)

# After (LLMBoost)
client = OpenAI(
    base_url="http://localhost:8011/v1",
    api_key="-"  # Not validated by LLMBoost
)

Everything else stays the same! Your existing code, error handling, and workflows continue to work.

Next Steps

Streaming - Learn about real-time token-by-token responses
Single-Node Multi-GPU - Scale inference across GPUs

Questions? Contact contact@mangoboost.io

Why This Matters​

Supported Endpoints​

Quick Start​

Start the Server​

Usage Examples​

Chat Completions​

Completions​

Response Format​

Chat Completion Response​

Completion Response​

Migrating from OpenAI​

Next Steps​

Why This Matters

Supported Endpoints

Quick Start

Start the Server

Usage Examples

Chat Completions

Completions

Response Format

Chat Completion Response

Completion Response

Migrating from OpenAI

Next Steps