OpenAI API Compatible
LLMBoost provides full compatibility with OpenAI's API, allowing you to migrate existing applications with zero code changes. Simply point your OpenAI client to LLMBoost's endpoint and enjoy superior performance without rewriting your application.
Why This Matters
Seamless Migration - Drop-in replacement for OpenAI API
Existing Tools Work - Use any OpenAI-compatible library or tool
Cost Savings - Run models on your own infrastructure
Better Performance - Faster inference with LLMBoost optimizations
Supported Endpoints
LLMBoost supports the following OpenAI-compatible endpoints:
| Endpoint | Description | Status |
|---|---|---|
/v1/chat/completions | Chat-style conversations | Fully Supported |
/v1/completions | Single, continuous prompt | Fully Supported |
Both endpoints support:
- Synchronous and asynchronous requests
- Streaming and non-streaming responses
- All standard OpenAI APIs
Quick Start
Start the Server
- Using LLMBoost Hub
- Manual Setup
# Deploy a model
lbh serve meta-llama/Llama-3.1-8B-Instruct --port 8011
The server will be available at http://localhost:8011 by default.
The below command should be executed inside the LLMBoost Docker container, which you can run using the manual docker setup.
# Inside LLMBoost container
llmboost serve --model_name meta-llama/Llama-3.1-8B-Instruct
The server will be available at http://localhost:8011 by default.
Usage Examples
Chat Completions
The chat completions endpoint is ideal for conversational AI applications.
By default, LLMBoost listens on port 8011. If you have configured a different port, please adjust the base_url accordingly.
- curl
- Python (Synchronous)
- Python (Async)
curl http://localhost:8011/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is deep learning?"
}
],
"max_tokens": 512,
"temperature": 0.7
}'
To use the OpenAI-compatible API in Python, install the openai package:
pip install openai
Then import the library and point it to your LLMBoost server as follows:
from openai import OpenAI
# Point to LLMBoost server
client = OpenAI(
base_url="http://localhost:8011/v1",
api_key="-" # Dummy key (not validated)
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is deep learning?"}
],
max_tokens=512,
temperature=0.7
)
print(response.choices[0].message.content)
To use the OpenAI-compatible API in Python, install the openai package:
pip install openai
Then import the library and point it to your LLMBoost server as follows:
import asyncio
from openai import AsyncOpenAI
# Point to LLMBoost server
client = AsyncOpenAI(
base_url="http://localhost:8011/v1",
api_key="-"
)
async def main():
response = await client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is deep learning?"}
],
max_tokens=512,
temperature=0.7
)
print(response.choices[0].message.content)
asyncio.run(main())
Completions
The completions endpoint is ideal for text continuation tasks.
- curl
- Python (Synchronous)
curl http://localhost:8011/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"prompt": "Once upon a time",
"max_tokens": 128,
"temperature": 0.8
}'
To use the OpenAI-compatible API in Python, install the openai package:
pip install openai
Then import the library and point it to your LLMBoost server as follows:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8011/v1",
api_key="-"
)
response = client.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
prompt="Once upon a time",
max_tokens=128,
temperature=0.8
)
print(response.choices[0].text)
Response Format
Responses follow the standard OpenAI format:
Chat Completion Response
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1699999999,
"model": "meta-llama/Llama-3.1-8B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Deep learning is a subset of machine learning..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 50,
"total_tokens": 70
}
}
Completion Response
{
"id": "cmpl-123",
"object": "text_completion",
"created": 1699999999,
"model": "meta-llama/Llama-3.1-8B-Instruct",
"choices": [
{
"text": "in a land far away, there lived a brave knight...",
"index": 0,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 4,
"completion_tokens": 50,
"total_tokens": 54
}
}
Migrating from OpenAI
Migrating from OpenAI to LLMBoost requires only two changes:
- Change the base URL to point to your LLMBoost server
- Update the model name to your deployed model
# Before (OpenAI)
client = OpenAI(
api_key="sk-..."
)
# After (LLMBoost)
client = OpenAI(
base_url="http://localhost:8011/v1",
api_key="-" # Not validated by LLMBoost
)
Everything else stays the same! Your existing code, error handling, and workflows continue to work.
Next Steps
- Streaming - Learn about real-time token-by-token responses
- Single-Node Multi-GPU - Scale inference across GPUs
Questions? Contact contact@mangoboost.io