Command Reference

Complete reference for all LLMBoost Hub (lbh) commands.

Quick Reference

Command	Purpose
`lbh login`	Authenticate with license
`lbh fetch [model]`	Find available models
`lbh list`	Show local models and status
`lbh prep <model>`	Download image and model
`lbh run <model>`	Start container
`lbh serve <model>`	Start inference server
`lbh test <model>`	Send test request
`lbh attach <model>`	Open shell in container
`lbh stop <model>`	Stop container
`lbh status`	Show model status
`lbh tune <model>`	Run autotuner

Getting Help

# Show all available commands
lbh -h

# Get help for a specific command
lbh [COMMAND] -h

# Enable verbose output for troubleshooting
lbh -v [COMMAND]

Core Commands

`lbh login`

Authenticate with your LLMBoost license.

lbh login

Behavior:

Reads from $LBH_LICENSE_PATH if set, otherwise prompts for a token
Validates the license online
Saves the license file to $LBH_HOME

`lbh fetch`

Search for available models supported by LLMBoost.

lbh fetch [model]

Arguments:

[model]: Optional model name pattern (supports regex-style matching). If omitted, lists all available models.

Examples:

# Search for Llama models
lbh fetch llama

# Search for specific model
lbh fetch Llama-3.1-8B

# List all available models
lbh fetch

Behavior:

Fetches the latest supported models from the LLMBoost registry
Filters results to match your available GPU hardware

`lbh list`

List local images and their status.

lbh list [model]

Arguments:

[model]: Optional model name to filter results

Status Indicators:

pending: Model not prepared; Docker image or model assets missing
stopped: Model prepared but container not running
running: Container running but idle
initializing: Container running and starting LLMBoost server
serving: LLMBoost server ready to accept requests
tuning: Autotuner running

`lbh prep`

Download Docker image and model assets.

lbh prep <Repo/Model-Name> [OPTIONS]

Arguments:

<Repo/Model-Name>: Full model name from HuggingFace (e.g., meta-llama/Llama-3.1-8B-Instruct)

Options:

--only-verify: Only check digests and sizes without downloading
--fresh: Remove existing image and re-download model assets from HuggingFace

Examples:

# Prepare a model
lbh prep meta-llama/Llama-3.1-8B-Instruct

# Verify existing downloads
lbh prep meta-llama/Llama-3.1-8B-Instruct --only-verify

# Force fresh download
lbh prep meta-llama/Llama-3.1-8B-Instruct --fresh

`lbh run`

Start a container for the specified model.

lbh run <Repo/Model-Name> [OPTIONS] -- [DOCKER_FLAGS...]

Arguments:

<Repo/Model-Name>: Full model name from HuggingFace

Options:

--image <image>: Override the Docker image
--model_path <path>: Override the model assets path
--restart: Restart container if already running

Docker Flags: Pass additional Docker flags after --:

Examples:

# Basic run
lbh run meta-llama/Llama-3.1-8B-Instruct

# Run with custom memory limit
lbh run meta-llama/Llama-3.1-8B-Instruct -- --memory=32g

# Run with custom network
lbh run meta-llama/Llama-3.1-8B-Instruct -- --network=my-network

# Restart existing container
lbh run meta-llama/Llama-3.1-8B-Instruct --restart

Behavior:

Automatically mounts $LBH_HOME and $LBH_WORKSPACE
Injects HF_TOKEN if available
AMD GPUs: Maps /dev/dri and /dev/kfd
NVIDIA GPUs: Uses --gpus all

`lbh serve`

Start the LLMBoost inference server inside a container.

lbh serve <Repo/Model-Name> [OPTIONS]

Arguments:

<Repo/Model-Name>: Full model name from HuggingFace

Options:

--host <host>: Server host address (default: 0.0.0.0)
--port <port>: Server port (default: 8080)
--detached: Don't wait for server to be ready
--force: Skip GPU utilization checks

Examples:

# Start server with defaults
lbh serve meta-llama/Llama-3.1-8B-Instruct

# Custom port
lbh serve meta-llama/Llama-3.1-8B-Instruct --port 8011

# Detached mode (don't wait)
lbh serve meta-llama/Llama-3.1-8B-Instruct --detached

# Force serve (skip GPU checks)
lbh serve meta-llama/Llama-3.1-8B-Instruct --force

Behavior:

Waits until server is ready (unless --detached)
Automatically runs lbh prep and lbh run if needed
Server will be accessible at http://<host>:<port>

`lbh test`

Send a test request to the inference server.

lbh test <Repo/Model-Name> [OPTIONS]

Arguments:

<Repo/Model-Name>: Full model name from HuggingFace

Options:

--query <text>: Custom test query (default: predefined test question)
-t <n>: Number of test iterations (default: 1)
--host <host>: Server host (default: 127.0.0.1)
--port <port>: Server port (default: 8080)

Examples:

# Basic test
lbh test meta-llama/Llama-3.1-8B-Instruct

# Custom query
lbh test meta-llama/Llama-3.1-8B-Instruct --query "What is AI?"

# Multiple iterations
lbh test meta-llama/Llama-3.1-8B-Instruct -t 5

# Custom port
lbh test meta-llama/Llama-3.1-8B-Instruct --port 8011

`lbh attach`

Open a shell inside the running container.

lbh attach <Repo/Model-Name> [OPTIONS]

Arguments:

<Repo/Model-Name>: Full model name from HuggingFace

Options:

-c <container>: Specify container name or ID

Example:

lbh attach meta-llama/Llama-3.1-8B-Instruct

`lbh stop`

Stop the running container.

lbh stop <Repo/Model-Name> [OPTIONS]

Arguments:

<Repo/Model-Name>: Full model name from HuggingFace

Options:

-c <container>: Specify container name or ID

Example:

lbh stop meta-llama/Llama-3.1-8B-Instruct

`lbh status`

Show the status of models.

lbh status [model]

Arguments:

[model]: Optional model name to filter results

Example:

# Status of all models
lbh status

# Status of specific model
lbh status meta-llama/Llama-3.1-8B-Instruct

Advanced Commands

`lbh tune`

Run the autotuner to optimize performance.

lbh tune <Repo/Model-Name> [OPTIONS]

Arguments:

<Repo/Model-Name>: Full model name from HuggingFace

Options:

--metrics <metric>: Optimization metric (default: throughput)
--detached: Run tuner in background
--image <image>: Override Docker image

Examples:

# Run autotuner
lbh tune meta-llama/Llama-3.1-8B-Instruct

# Run in background
lbh tune meta-llama/Llama-3.1-8B-Instruct --detached

Behavior:

Stores results to $LBH_HOME/inference.db
Automatically loads optimized settings on next lbh serve

`lbh completions`

Set up shell completions for easier command usage.

# For current shell session
eval "$(lbh completions)"

# Persist for virtual environment
lbh completions --venv

# Persist for shell profile
lbh completions --profile

Cluster Commands (Multi-Node Deployments)

lbh cluster install [--kubeconfig PATH] [--docker-username USER] [--docker-pat TOKEN] [--docker-email EMAIL] [-- EXTRA_HELM_ARGS]
- Install LLMBoost Helm chart and Kubernetes infrastructure for multi-node deployments.
- Displays access credentials for management and monitoring UIs after installation.
- Requires running Kubernetes cluster and helm installed.
- Docker authentication options:
  - --docker-username, --docker-pat, --docker-email: Provide credentials directly (all three required together)
  - Alternatively, run docker login and credentials will be read from ~/.docker/config.json
  - If neither provided, cluster will be installed without Docker registry secret
lbh cluster deploy [-f CONFIG_FILE] [--kubeconfig PATH]
- Deploy models across cluster nodes based on configuration file.
- Generates and applies Kubernetes CRD manifests.
- Config template: $LBH_HOME/utils/template_cluster_config.jsonc
lbh cluster status [--kubeconfig PATH] [--show-secrets]
- Show status of all model deployments and management services.
- Displays summary statistics: Models: <ready>/<total> and Mgmt.: <ready>/<total>
- Shows model deployment table with pod status, restarts, and error messages.
- Service URLs for management UI and monitoring (Grafana).
- Use --show-secrets to display access credentials (masked).
- Use -v --show-secrets for full unmasked credentials.
lbh cluster logs [--models|--management] [--pod POD_NAME] [--tail TAIL_ARGS...] [--grep GREP_ARGS...] [--kubeconfig PATH]
- View logs from model deployment or management pods.
- --models: Show logs from model deployment pods.
- --management: Show logs from management/monitoring pods (displays as table).
- --pod POD_NAME: Filter to specific pod by name.
- --tail TAIL_ARGS: Show last N lines from workspace logs (default: 10).
- --grep GREP_ARGS: Filter logs by pattern (uses awk for pattern matching).
- Defaults to showing both model and management logs if no filter specified.
lbh cluster remove <MODEL_NAME> [--all] [--kubeconfig PATH]
- Remove specific model deployments from the cluster.
- Deletes LLMBoostDeployment custom resources by name.
- --all: Remove all model deployments (requires confirmation unless used with --force).
- Example: lbh cluster remove facebook/opt-125m or lbh cluster remove --all
lbh cluster uninstall [--kubeconfig PATH] [--force]
- Uninstall LLMBoost cluster resources.
- Prompts for confirmation unless --force is used.
- Does not automatically delete the namespace.

Next: Explore LLMBoost Features to see what you can build.

Quick Reference​

Getting Help​

Core Commands​

lbh login​

lbh fetch​

lbh list​

lbh prep​

lbh run​

lbh serve​

lbh test​

lbh attach​

lbh stop​

lbh status​

Advanced Commands​

lbh tune​

lbh completions​

Cluster Commands (Multi-Node Deployments)​

Quick Reference

Getting Help

Core Commands

`lbh login`

`lbh fetch`

`lbh list`

`lbh prep`

`lbh run`

`lbh serve`

`lbh test`

`lbh attach`

`lbh stop`

`lbh status`

Advanced Commands

`lbh tune`

`lbh completions`

Cluster Commands (Multi-Node Deployments)