Quick Start Guide
Get LLMBoost up and running in under 5 minutes. This guide shows you how to deploy your first model and make inference requests using LLMBoost Hub (lbh), our recommended CLI tool for streamlined deployment.
Prerequisites
Before you begin, ensure you have:
- LLMBoost License (Contact contact@mangoboost.io to obtain a license key and Docker credentials)
- Python 3.11+ installed
- Docker 27.3.1+ installed, and configured for passwordless (non-sudo) execution. Set up following this guide.
- AMD GPU with ROCm 6.3+
- HuggingFace account for model access
We recommend using LLMBoost Hub (lbh) for the easiest setup experience. Advanced users can also use the manual Docker approach.
- Using LLMBoost Hub (Recommended)
- Manual Docker Setup
Using LLMBoost Hub (lbh)
LLMBoost Hub is the easiest way to deploy and manage LLM inference with LLMBoost.
Step 0: (Optional) Create new Python 3.11+ Virtual Environment
(Click to expand/collapse)
We recommend using a virtual environment to avoid dependency conflicts and permission issues.
- uv
- conda
- venv
pip install --user uv
uv venv lbh-venv --seed --python 3.11 # Or higher
source lbh-venv/bin/activate
which python # Verify python version
which pip # Verify pip version (should be in same directory as python)
conda create -n lbh-env python=3.11 -y
conda activate lbh-env
which python # Verify python version
which pip # Verify pip version (should be in same directory as python)
python3.11 -m venv lbh-env
source lbh-env/bin/activate
which python # Verify python version
which pip # Verify pip version (should be in same directory as python)
All these approaches require you to activate the virtual environment in each terminal session before running lbh commands,
or you can set up automatic activation using tools like direnv or shell-specific configurations.
To exit the virtual environment, simply run deactivate.
Step 1: Install LLMBoost Hub
lbh as root userRunning lbh commands with sudo can lead to permission issues with Docker and Python packages. Always run lbh as a regular user.
pip install llmboost_hub
Troubleshooting Installation Issues (Click to expand/collapse)
If you encounter issues during installation, try the following:
- Ensure you are in the correct (virtual) environment (
which python). - Ensure that your
pipis up to date by runningpip install --upgrade pip. - Ensure python 3.11+ is being used, and a pip in the same directory is being used (
which pythonandwhich pipshould be in the same directory). - Try donwloading from testpypi:
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple llmboost_hub --pre
- If you are behind a proxy, configure pip to use the proxy:
pip install lbh --proxy http://user:password@proxyserver:port
Verify installed version is 0.3.0 or higher:
lbh --version
Step 2: Authenticate
Authenticate with your LLMBoost license, HuggingFace, and Docker:
# Authenticate LLMBoost license (required only once)
lbh login
# Login to HuggingFace (or set HF_TOKEN environment variable)
hf auth login
# Login to Docker (provided by MangoBoost)
docker login -u <docker_username>
Step 3: Fetch List of Optimized Models
lbh fetch # required only once a week to sync model list
Step 4: Serve Your First Model
Deploy a model with a single command:
lbh serve meta-llama/Llama-3.1-8B-Instruct
This command automatically:
- Downloads the LLMBoost Docker image
- Downloads the model from HuggingFace
- Starts the inference server on port 8011
- Configures optimal settings for your hardware
The first run will take a few minutes to download the Docker image and model. Subsequent runs will be much faster.
To quickly restart the server in the future, use the -r|--restart flag:
lbh serve --restart meta-llama/Llama-3.1-8B-Instruct
This automatically restarts the existing container (without re-downloading assets). For more options, see the Command Reference.
Step 5: Make Your First Request
Once the server is ready, test it with a simple request:
lbh test meta-llama/Llama-3.1-8B-Instruct --query "Explain LLMs in simple terms."
Or use curl:
curl http://localhost:8011/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain LLMs in simple terms."}
]
}'
Step 6: Stop serving the model
To stop serving the model, use the following command:
lbh stop meta-llama/Llama-3.1-8B-Instruct
Step 7: Explore More Commands
# View all running containers
lbh list
# Attach to a running container
lbh attach meta-llama/Llama-3.1-8B-Instruct
# Start the container with custom docker args
lbh run meta-llama/Llama-3.1-8B-Instruct -- --network host # custom docker args
# Get help
lbh --help
# or for any command
lbh <command> --help
Command Workflow
The typical workflow for deploying and managing models with LLMBoost Hub:
lbh login # Authenticate with LLMBoost license
lbh fetch [model] # Search for available models (optional filter)
lbh list [model] # Show local images and their status
lbh prep <Repo/Model-Name> # Download Docker image and model assets
lbh run <Repo/Model-Name> # Start container
lbh serve <Repo/Model-Name> # Start LLMBoost inference server
lbh test <Repo/Model-Name> # Send a test request
lbh stop <Repo/Model-Name> # Stop container
Complete Example
# Authenticate (one-time setup)
lbh login
# Search for Llama models
lbh fetch llama
# Check local status
lbh list llama
# Download the model and the required Docker image
lbh prep meta-llama/Llama-3.1-8B-Instruct
# Launch the Docker container (accepts custom docker args)
lbh run meta-llama/Llama-3.1-8B-Instruct
# Launches the inference server inside the container. If the model or container isn't available, it will automatically run prep and run steps with default settings before starting the server.
lbh serve meta-llama/Llama-3.1-8B-Instruct
# Test the deployment
lbh test meta-llama/Llama-3.1-8B-Instruct
# Stop when done
lbh stop meta-llama/Llama-3.1-8B-Instruct
# Restart serving
lbh serve --restart meta-llama/Llama-3.1-8B-Instruct
See the complete Command Reference for all available commands.
Manual Docker Setup
For advanced users who prefer direct Docker control or custom configurations.
Step 1: Set Environment Variables
export MODEL_PATH=<absolute_path_to_model_directory>
export LICENSE_FILE=<absolute_path_to_license_file>
export HF_TOKEN=<your_huggingface_token>
MODEL_PATH: Absolute path to your local model directoryLICENSE_FILE: Path to your LLMBoost license file (contact contact@mangoboost.io if needed)HF_TOKEN: Get from huggingface.co/settings/tokens
Step 2: Pull LLMBoost Docker Image
Contact the MangoBoost team for access to the Docker image, then pull it:
docker pull mangollm/mb-llmboost-rocm:1.7.0
Step 3: Run LLMBoost Container
docker run -it --rm \
--network host \
--group-add video \
--ipc host \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--device=/dev/dri:/dev/dri \
--device=/dev/kfd:/dev/kfd \
-v $MODEL_PATH:/workspace/models \
-v $LICENSE_FILE:/workspace/llmboost_license.skm \
-w /workspace \
-e HF_TOKEN=$HF_TOKEN \
mangollm/mb-llmboost-rocm:1.7.0 \
bash
Step 4: Start Inference Server
Inside the container, start the inference server:
llmboost serve --model_name meta-llama/Llama-3.1-8B-Instruct
The server will start on port 8011 by default. Wait for the "Ready" message before making requests.
Step 5: Make Your First Request
From another terminal (on the host), test the server:
curl http://localhost:8011/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain LLMs in simple terms."}
]
}'
Next Steps
Now that you have LLMBoost running, explore its powerful features:
- OpenAI API Compatible - Use OpenAI client libraries with zero code changes
- SLO Aware Serving - Optimize for latency and throughput
- Multi-GPU Support - Scale inference across GPUs
- Multi-Node Deployment - Scale across Kubernetes clusters
- Streaming Responses - Real-time token-by-token output
- Vision Models - Deploy multimodal image-to-text models
Configuration & Commands
- Configuration Options - Customize your deployment
- Command Reference - Complete
lbhCLI documentation