LLMBoost Hub Advanced Usage

Learn how to combine LLMBoost Hub (lbh) with advanced Docker workflows for maximum flexibility and control.

Overview

While lbh serve provides a simple one-command deployment, you may need more control over container management, storage locations, or networking configurations. This guide shows how to use lbh run and lbh attach for advanced workflows while retaining the convenience of LLMBoost Hub.

Why Use Advanced LBH Workflows?

Use these patterns when you need:

Custom storage locations: Store models on shared filesystems like /lustre1/$USER/llm_models
Manual container control: Start containers with lbh run, then attach and configure manually
Integration with existing Docker setups: Combine LBH with your current containerization workflows
Advanced networking: Custom network configurations or multi-container setups
Development workflows: Iterative testing and debugging with direct container access

Advanced Workflow

Step 1: Start Container with `lbh run`

Instead of lbh serve, use lbh run to start a container without immediately launching the inference server:

lbh run meta-llama/Llama-3.1-8B-Instruct

This starts a container but doesn't run the inference server yet. The container name is automatically derived from the model name.

Step 2: Attach to Container

Use lbh attach to get a shell inside the running container:

lbh attach meta-llama/Llama-3.1-8B-Instruct

Now you're inside the container and can run advanced configuration steps.

Step 3: Run Custom Setup

Inside the container, you have full control:

# Set environment variables
export CUDA_VISIBLE_DEVICES=0,1

# Run the inference server with custom options
llmboost serve meta-llama/Llama-3.1-8B-Instruct \
  --tensor-parallel-size 2 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.95

Configuration with Environment Variables

LLMBoost Hub supports several environment variables for customization:

Model Storage Location

Store all models in a custom directory (useful for shared storage):

export LBH_MODELS=/lustre1/$USER/llm_models

lbh run meta-llama/Llama-3.1-8B-Instruct

This uses /lustre1/$USER/llm_models as the model cache directory and mounts it inside the container.

License File Location

Specify a custom license file path:

export LBH_LICENSE_PATH=/shared/licenses/llmboost_license.skm

lbh run meta-llama/Llama-3.1-8B-Instruct

Hugging Face Token

Set your Hugging Face token for downloading gated models:

export HF_TOKEN=hf_xxxxxxxxxxxxx

lbh run meta-llama/Llama-3.1-70B-Instruct

Advanced Use Cases

1. Shared Model Storage Across Users

In multi-user environments, store models in a shared location to avoid duplication:

# Set shared model directory
export LBH_MODELS=/shared/llm_models

# User 1 downloads model (first time)
lbh run meta-llama/Llama-3.1-8B-Instruct

# User 2 reuses the cached model
lbh run meta-llama/Llama-3.1-8B-Instruct

2. Development and Debugging Workflow

For iterative development, use lbh run + lbh attach to test different configurations:

# Start container without serving
lbh run Qwen/Qwen2.5-7B-Instruct

# Attach and test different configurations
lbh attach Qwen/Qwen2.5-7B-Instruct

# Inside container - test configuration 1
llmboost serve Qwen/Qwen2.5-7B-Instruct --max-model-len 4096
# Exit (Ctrl+C)

# Test configuration 2
llmboost serve Qwen/Qwen2.5-7B-Instruct --max-model-len 8192 --gpu-memory-utilization 0.9

3. GPU Isolation and Selection

Restrict container access to specific GPUs using AMD's GPU Isolation Techniques.

Method 1: Using HIP_VISIBLE_DEVICES (Recommended)

# Use only GPU 0 and 2
lbh run meta-llama/Llama-3.1-8B-Instruct -- \
  -e HIP_VISIBLE_DEVICES=0,2

# Use single GPU
lbh run meta-llama/Llama-3.1-8B-Instruct -- \
  -e HIP_VISIBLE_DEVICES=1

Method 2: Device-Level Isolation

For finer control, specify exact device nodes as documented in Restricting GPU Access:

# Restrict to specific render devices
lbh run meta-llama/Llama-3.1-8B-Instruct -- \
  -e HIP_VISIBLE_DEVICES=0,1 \
  --device /dev/dri/renderD128 \
  --device /dev/dri/renderD129

# Check device mapping first
ls -l /dev/dri/

For NVIDIA GPUs:

# Use specific NVIDIA GPUs
lbh run meta-llama/Llama-3.1-8B-Instruct -- \
  -e CUDA_VISIBLE_DEVICES=0,2

GPU Selection

Use rocm-smi (AMD) or nvidia-smi (NVIDIA) to identify GPU indices before restricting access. The HIP_VISIBLE_DEVICES environment variable is the preferred method for AMD GPUs as it works across different ROCm versions. Additionaly, please ensure that the allocated GPUs have sufficient memory for the model being served.

4. Custom Volume Mounts

Mount additional volumes for datasets or outputs:

lbh run meta-llama/Llama-3.1-8B-Instruct -- \
  -v /data/datasets:/workspace/datasets \
  -v /data/outputs:/workspace/outputs

5. Multi-GPU Configuration Testing

5.1 Test different data parallelism configurations with specific GPUs:

# Start container with first four GPUs
export LBH_MODELS=/lustre1/$USER/llm_models
lbh run meta-llama/Llama-3.1-70B-Instruct -- -e HIP_VISIBLE_DEVICES=0,1,2,3

# Attach and test DP=2
lbh attach meta-llama/Llama-3.1-70B-Instruct
llmboost serve meta-llama/Llama-3.1-70B-Instruct --data-parallel-size 2
# Exit (Cntl+C) and restart

# Test DP=4
llmboost serve meta-llama/Llama-3.1-70B-Instruct --data-parallel-size 4

5.2 Test different tensor parallelism configurations:

# Start container with all GPUs (default behavior)
export LBH_MODELS=/lustre1/$USER/llm_models
lbh run meta-llama/Llama-3.1-70B-Instruct

# Attach and test TP=2
lbh attach meta-llama/Llama-3.1-70B-Instruct
llmboost serve meta-llama/Llama-3.1-70B-Instruct --tensor-parallel-size 2
# Exit (Cntl+C) and restart

# Test TP=4
llmboost serve meta-llama/Llama-3.1-70B-Instruct --tensor-parallel-size 4

6. Combining with Python SDK

Start container and use the Python SDK for programmatic control:

# Start container
lbh run meta-llama/Llama-3.1-8B-Instruct

# Attach and start server with custom port
lbh attach meta-llama/Llama-3.1-8B-Instruct
llmboost serve meta-llama/Llama-3.1-8B-Instruct --port 8000

Then from your Python application:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="dummy"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Environment Variable Reference

Variable	Description	Example
`LBH_MODELS`	Directory for model cache	`/lustre1/$USER/llm_models`
`LBH_LICENSE_PATH`	Path to license file	`/shared/licenses/llmboost.skm`
`HF_TOKEN`	HuggingFace token for gated models	`hf_xxxxxxxxxxxxx`
`LBH_WORKSPACE`	User workspace directory	`$LBH_HOME/workspace`
`LBH_HOME`	Base directory for LLMBoost Hub data	`~/.llmboost_hub`

LBH Command Reference

Container Management

# Start container without serving
lbh run <Repo/Model-Name>

# Attach to running container
lbh attach <Repo/Model-Name>

# List local models and status
lbh list [model]

# Stop container
lbh stop <Repo/Model-Name>

Common Options for `lbh run`

-i, --image <image>        # Override Docker image
-m, --model_path <path>    # Override model assets path
-r, --restart              # Restart if already running
-- <DOCKER_FLAGS>          # Pass additional Docker flags after --

Examples:

# Basic usage
lbh run meta-llama/Llama-3.1-8B-Instruct

# With custom Docker flags
lbh run meta-llama/Llama-3.1-8B-Instruct -- --memory=32g

# Mount local model directory
lbh run meta-llama/Llama-3.1-8B-Instruct -m /path/to/model

Best Practices

Use Shared Storage: Set LBH_MODELS to a shared directory in multi-user environments
Test Before Production: Use lbh run + lbh attach to test configurations
Document Your Setup: Keep notes on environment variables and custom configurations
Monitor Resources: Use lbh list to check container status regularly
Clean Up: Stop unused containers with lbh stop to free resources

Troubleshooting

Container Won't Start

Check existing containers and restart if needed:

lbh list
lbh run meta-llama/Llama-3.1-8B-Instruct --restart

Model Download Issues

Verify your Hugging Face token:

export HF_TOKEN=hf_xxxxxxxxxxxxx
lbh run meta-llama/Llama-3.1-8B-Instruct

GPU Not Visible

Check GPU availability:

nvidia-smi  # or rocm-smi for AMD
lbh list  # Verify GPU detection

Next Steps

In-Process Python SDK - Integrate LLMBoost directly into Python apps
OpenWebUI Integration - Deploy with a chat interface
Configuration Options - Tune performance parameters

Need Help? Contact contact@mangoboost.io for assistance with advanced configurations.

Overview​

Why Use Advanced LBH Workflows?​

Advanced Workflow​

Step 1: Start Container with lbh run​

Step 2: Attach to Container​

Step 3: Run Custom Setup​

Configuration with Environment Variables​

Model Storage Location​

License File Location​

Hugging Face Token​

Advanced Use Cases​

1. Shared Model Storage Across Users​

2. Development and Debugging Workflow​

3. GPU Isolation and Selection​

4. Custom Volume Mounts​

5. Multi-GPU Configuration Testing​

5.1 Test different data parallelism configurations with specific GPUs:​

5.2 Test different tensor parallelism configurations:​

6. Combining with Python SDK​

Environment Variable Reference​

LBH Command Reference​

Container Management​

Common Options for lbh run​

Best Practices​

Troubleshooting​

Container Won't Start​

Model Download Issues​

GPU Not Visible​

Next Steps​