Skip to main content

OpenAI API Compatible

LLMBoost provides full compatibility with OpenAI's API, allowing you to migrate existing applications with zero code changes. Simply point your OpenAI client to LLMBoost's endpoint and enjoy superior performance without rewriting your application.

Why This Matters

Seamless Migration - Drop-in replacement for OpenAI API
Existing Tools Work - Use any OpenAI-compatible library or tool
Cost Savings - Run models on your own infrastructure
Better Performance - Faster inference with LLMBoost optimizations


Supported Endpoints

LLMBoost supports the following OpenAI-compatible endpoints:

EndpointDescriptionStatus
/v1/chat/completionsChat-style conversationsFully Supported
/v1/completionsSingle, continuous promptFully Supported

Both endpoints support:

  • Synchronous and asynchronous requests
  • Streaming and non-streaming responses
  • All standard OpenAI APIs

Quick Start

Start the Server

# Deploy a model
lbh serve meta-llama/Llama-3.1-8B-Instruct --port 8011

The server will be available at http://localhost:8011 by default.


Usage Examples

Chat Completions

The chat completions endpoint is ideal for conversational AI applications.

Default port

By default, LLMBoost listens on port 8011. If you have configured a different port, please adjust the base_url accordingly.

curl http://localhost:8011/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is deep learning?"
}
],
"max_tokens": 512,
"temperature": 0.7
}'

Completions

The completions endpoint is ideal for text continuation tasks.

curl http://localhost:8011/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"prompt": "Once upon a time",
"max_tokens": 128,
"temperature": 0.8
}'

Response Format

Responses follow the standard OpenAI format:

Chat Completion Response

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1699999999,
"model": "meta-llama/Llama-3.1-8B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Deep learning is a subset of machine learning..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 50,
"total_tokens": 70
}
}

Completion Response

{
"id": "cmpl-123",
"object": "text_completion",
"created": 1699999999,
"model": "meta-llama/Llama-3.1-8B-Instruct",
"choices": [
{
"text": "in a land far away, there lived a brave knight...",
"index": 0,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 4,
"completion_tokens": 50,
"total_tokens": 54
}
}

Migrating from OpenAI

Migrating from OpenAI to LLMBoost requires only two changes:

  1. Change the base URL to point to your LLMBoost server
  2. Update the model name to your deployed model
# Before (OpenAI)
client = OpenAI(
api_key="sk-..."
)

# After (LLMBoost)
client = OpenAI(
base_url="http://localhost:8011/v1",
api_key="-" # Not validated by LLMBoost
)

Everything else stays the same! Your existing code, error handling, and workflows continue to work.


Next Steps


Questions? Contact contact@mangoboost.io