Skip to main content

Quick Start Guide

Get LLMBoost up and running in under 5 minutes. This guide shows you how to deploy your first model and make inference requests using LLMBoost Hub (lbh), our recommended CLI tool for streamlined deployment.

Prerequisites

Before you begin, ensure you have:

  • LLMBoost License (Contact contact@mangoboost.io to obtain a license key and Docker credentials)
  • Python 3.11+ installed
  • Docker 27.3.1+ installed, and configured for passwordless (non-sudo) execution. Set up following this guide.
  • AMD GPU with ROCm 6.3+
  • HuggingFace account for model access

Quick-Start Diagram

Recommended Approach

We recommend using LLMBoost Hub (lbh) for the easiest setup experience. Advanced users can also use the manual Docker approach.

Using LLMBoost Hub (lbh)

LLMBoost Hub is the easiest way to deploy and manage LLM inference with LLMBoost.

Step 0: (Optional) Create new Python 3.11+ Virtual Environment

(Click to expand/collapse)

We recommend using a virtual environment to avoid dependency conflicts and permission issues.

Recommended Approach

We recommend using uv or conda, but you can also use venv.

pip install --user uv 
uv venv lbh-venv --seed --python 3.11 # Or higher
source lbh-venv/bin/activate
which python # Verify python version
which pip # Verify pip version (should be in same directory as python)

All these approaches require you to activate the virtual environment in each terminal session before running lbh commands, or you can set up automatic activation using tools like direnv or shell-specific configurations. To exit the virtual environment, simply run deactivate.

Step 1: Install LLMBoost Hub

Do not use lbh as root user

Running lbh commands with sudo can lead to permission issues with Docker and Python packages. Always run lbh as a regular user.

pip install llmboost_hub
Troubleshooting Installation Issues (Click to expand/collapse)

If you encounter issues during installation, try the following:

  • Ensure you are in the correct (virtual) environment (which python).
  • Ensure that your pip is up to date by running pip install --upgrade pip.
  • Ensure python 3.11+ is being used, and a pip in the same directory is being used (which python and which pip should be in the same directory).
  • Try donwloading from testpypi:
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple llmboost_hub --pre
  • If you are behind a proxy, configure pip to use the proxy:
pip install lbh --proxy http://user:password@proxyserver:port

Verify installed version is 0.3.0 or higher:

lbh --version

Step 2: Authenticate

Authenticate with your LLMBoost license, HuggingFace, and Docker:

# Authenticate LLMBoost license (required only once)
lbh login

# Login to HuggingFace (or set HF_TOKEN environment variable)
hf auth login

# Login to Docker (provided by MangoBoost)
docker login -u <docker_username>

Step 3: Fetch List of Optimized Models

lbh fetch # required only once a week to sync model list 

Step 4: Serve Your First Model

Deploy a model with a single command:

lbh serve meta-llama/Llama-3.1-8B-Instruct

This command automatically:

  • Downloads the LLMBoost Docker image
  • Downloads the model from HuggingFace
  • Starts the inference server on port 8011
  • Configures optimal settings for your hardware
First Run

The first run will take a few minutes to download the Docker image and model. Subsequent runs will be much faster.

Quick Restart

To quickly restart the server in the future, use the -r|--restart flag:

lbh serve --restart meta-llama/Llama-3.1-8B-Instruct

This automatically restarts the existing container (without re-downloading assets). For more options, see the Command Reference.

Step 5: Make Your First Request

Once the server is ready, test it with a simple request:

lbh test meta-llama/Llama-3.1-8B-Instruct --query "Explain LLMs in simple terms."

Or use curl:

curl http://localhost:8011/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain LLMs in simple terms."}
]
}'

Step 6: Stop serving the model

To stop serving the model, use the following command:

lbh stop meta-llama/Llama-3.1-8B-Instruct

Step 7: Explore More Commands

# View all running containers
lbh list

# Attach to a running container
lbh attach meta-llama/Llama-3.1-8B-Instruct

# Start the container with custom docker args
lbh run meta-llama/Llama-3.1-8B-Instruct -- --network host # custom docker args

# Get help
lbh --help
# or for any command
lbh <command> --help

Command Workflow

The typical workflow for deploying and managing models with LLMBoost Hub:

lbh login                     # Authenticate with LLMBoost license
lbh fetch [model] # Search for available models (optional filter)
lbh list [model] # Show local images and their status
lbh prep <Repo/Model-Name> # Download Docker image and model assets
lbh run <Repo/Model-Name> # Start container
lbh serve <Repo/Model-Name> # Start LLMBoost inference server
lbh test <Repo/Model-Name> # Send a test request
lbh stop <Repo/Model-Name> # Stop container

Complete Example

# Authenticate (one-time setup)
lbh login

# Search for Llama models
lbh fetch llama

# Check local status
lbh list llama

# Download the model and the required Docker image
lbh prep meta-llama/Llama-3.1-8B-Instruct

# Launch the Docker container (accepts custom docker args)
lbh run meta-llama/Llama-3.1-8B-Instruct

# Launches the inference server inside the container. If the model or container isn't available, it will automatically run prep and run steps with default settings before starting the server.
lbh serve meta-llama/Llama-3.1-8B-Instruct

# Test the deployment
lbh test meta-llama/Llama-3.1-8B-Instruct

# Stop when done
lbh stop meta-llama/Llama-3.1-8B-Instruct

# Restart serving
lbh serve --restart meta-llama/Llama-3.1-8B-Instruct

See the complete Command Reference for all available commands.


Next Steps

Now that you have LLMBoost running, explore its powerful features:

Configuration & Commands