Skip to main content

Command Reference

Complete reference for all LLMBoost Hub (lbh) commands.

Quick Reference

CommandPurpose
lbh loginAuthenticate with license
lbh fetch [model]Find available models
lbh listShow local models and status
lbh prep <model>Download image and model
lbh run <model>Start container
lbh serve <model>Start inference server
lbh test <model>Send test request
lbh attach <model>Open shell in container
lbh stop <model>Stop container
lbh statusShow model status
lbh tune <model>Run autotuner

Getting Help

# Show all available commands
lbh -h

# Get help for a specific command
lbh [COMMAND] -h

# Enable verbose output for troubleshooting
lbh -v [COMMAND]

Core Commands

lbh login

Authenticate with your LLMBoost license.

lbh login

Behavior:

  • Reads from $LBH_LICENSE_PATH if set, otherwise prompts for a token
  • Validates the license online
  • Saves the license file to $LBH_HOME

lbh fetch

Search for available models supported by LLMBoost.

lbh fetch [model]

Arguments:

  • [model]: Optional model name pattern (supports regex-style matching). If omitted, lists all available models.

Examples:

# Search for Llama models
lbh fetch llama

# Search for specific model
lbh fetch Llama-3.1-8B

# List all available models
lbh fetch

Behavior:

  • Fetches the latest supported models from the LLMBoost registry
  • Filters results to match your available GPU hardware

lbh list

List local images and their status.

lbh list [model]

Arguments:

  • [model]: Optional model name to filter results

Status Indicators:

  • pending: Model not prepared; Docker image or model assets missing
  • stopped: Model prepared but container not running
  • running: Container running but idle
  • initializing: Container running and starting LLMBoost server
  • serving: LLMBoost server ready to accept requests
  • tuning: Autotuner running

lbh prep

Download Docker image and model assets.

lbh prep <Repo/Model-Name> [OPTIONS]

Arguments:

  • <Repo/Model-Name>: Full model name from HuggingFace (e.g., meta-llama/Llama-3.1-8B-Instruct)

Options:

  • --only-verify: Only check digests and sizes without downloading
  • --fresh: Remove existing image and re-download model assets from HuggingFace

Examples:

# Prepare a model
lbh prep meta-llama/Llama-3.1-8B-Instruct

# Verify existing downloads
lbh prep meta-llama/Llama-3.1-8B-Instruct --only-verify

# Force fresh download
lbh prep meta-llama/Llama-3.1-8B-Instruct --fresh

lbh run

Start a container for the specified model.

lbh run <Repo/Model-Name> [OPTIONS] -- [DOCKER_FLAGS...]

Arguments:

  • <Repo/Model-Name>: Full model name from HuggingFace

Options:

  • --image <image>: Override the Docker image
  • --model_path <path>: Override the model assets path
  • --restart: Restart container if already running

Docker Flags: Pass additional Docker flags after --:

Examples:

# Basic run
lbh run meta-llama/Llama-3.1-8B-Instruct

# Run with custom memory limit
lbh run meta-llama/Llama-3.1-8B-Instruct -- --memory=32g

# Run with custom network
lbh run meta-llama/Llama-3.1-8B-Instruct -- --network=my-network

# Restart existing container
lbh run meta-llama/Llama-3.1-8B-Instruct --restart

Behavior:

  • Automatically mounts $LBH_HOME and $LBH_WORKSPACE
  • Injects HF_TOKEN if available
  • AMD GPUs: Maps /dev/dri and /dev/kfd
  • NVIDIA GPUs: Uses --gpus all

lbh serve

Start the LLMBoost inference server inside a container.

lbh serve <Repo/Model-Name> [OPTIONS]

Arguments:

  • <Repo/Model-Name>: Full model name from HuggingFace

Options:

  • --host <host>: Server host address (default: 0.0.0.0)
  • --port <port>: Server port (default: 8080)
  • --detached: Don't wait for server to be ready
  • --force: Skip GPU utilization checks

Examples:

# Start server with defaults
lbh serve meta-llama/Llama-3.1-8B-Instruct

# Custom port
lbh serve meta-llama/Llama-3.1-8B-Instruct --port 8011

# Detached mode (don't wait)
lbh serve meta-llama/Llama-3.1-8B-Instruct --detached

# Force serve (skip GPU checks)
lbh serve meta-llama/Llama-3.1-8B-Instruct --force

Behavior:

  • Waits until server is ready (unless --detached)
  • Automatically runs lbh prep and lbh run if needed
  • Server will be accessible at http://<host>:<port>

lbh test

Send a test request to the inference server.

lbh test <Repo/Model-Name> [OPTIONS]

Arguments:

  • <Repo/Model-Name>: Full model name from HuggingFace

Options:

  • --query <text>: Custom test query (default: predefined test question)
  • -t <n>: Number of test iterations (default: 1)
  • --host <host>: Server host (default: 127.0.0.1)
  • --port <port>: Server port (default: 8080)

Examples:

# Basic test
lbh test meta-llama/Llama-3.1-8B-Instruct

# Custom query
lbh test meta-llama/Llama-3.1-8B-Instruct --query "What is AI?"

# Multiple iterations
lbh test meta-llama/Llama-3.1-8B-Instruct -t 5

# Custom port
lbh test meta-llama/Llama-3.1-8B-Instruct --port 8011

lbh attach

Open a shell inside the running container.

lbh attach <Repo/Model-Name> [OPTIONS]

Arguments:

  • <Repo/Model-Name>: Full model name from HuggingFace

Options:

  • -c <container>: Specify container name or ID

Example:

lbh attach meta-llama/Llama-3.1-8B-Instruct

lbh stop

Stop the running container.

lbh stop <Repo/Model-Name> [OPTIONS]

Arguments:

  • <Repo/Model-Name>: Full model name from HuggingFace

Options:

  • -c <container>: Specify container name or ID

Example:

lbh stop meta-llama/Llama-3.1-8B-Instruct

lbh status

Show the status of models.

lbh status [model]

Arguments:

  • [model]: Optional model name to filter results

Example:

# Status of all models
lbh status

# Status of specific model
lbh status meta-llama/Llama-3.1-8B-Instruct

Advanced Commands

lbh tune

Run the autotuner to optimize performance.

lbh tune <Repo/Model-Name> [OPTIONS]

Arguments:

  • <Repo/Model-Name>: Full model name from HuggingFace

Options:

  • --metrics <metric>: Optimization metric (default: throughput)
  • --detached: Run tuner in background
  • --image <image>: Override Docker image

Examples:

# Run autotuner
lbh tune meta-llama/Llama-3.1-8B-Instruct

# Run in background
lbh tune meta-llama/Llama-3.1-8B-Instruct --detached

Behavior:

  • Stores results to $LBH_HOME/inference.db
  • Automatically loads optimized settings on next lbh serve

lbh completions

Set up shell completions for easier command usage.

# For current shell session
eval "$(lbh completions)"

# Persist for virtual environment
lbh completions --venv

# Persist for shell profile
lbh completions --profile

Cluster Commands (Multi-Node Deployments)

  • lbh cluster install [--kubeconfig PATH] [--docker-username USER] [--docker-pat TOKEN] [--docker-email EMAIL] [-- EXTRA_HELM_ARGS]

    • Install LLMBoost Helm chart and Kubernetes infrastructure for multi-node deployments.
    • Displays access credentials for management and monitoring UIs after installation.
    • Requires running Kubernetes cluster and helm installed.
    • Docker authentication options:
      • --docker-username, --docker-pat, --docker-email: Provide credentials directly (all three required together)
      • Alternatively, run docker login and credentials will be read from ~/.docker/config.json
      • If neither provided, cluster will be installed without Docker registry secret
  • lbh cluster deploy [-f CONFIG_FILE] [--kubeconfig PATH]

    • Deploy models across cluster nodes based on configuration file.
    • Generates and applies Kubernetes CRD manifests.
    • Config template: $LBH_HOME/utils/template_cluster_config.jsonc
  • lbh cluster status [--kubeconfig PATH] [--show-secrets]

    • Show status of all model deployments and management services.
    • Displays summary statistics: Models: <ready>/<total> and Mgmt.: <ready>/<total>
    • Shows model deployment table with pod status, restarts, and error messages.
    • Service URLs for management UI and monitoring (Grafana).
    • Use --show-secrets to display access credentials (masked).
    • Use -v --show-secrets for full unmasked credentials.
  • lbh cluster logs [--models|--management] [--pod POD_NAME] [--tail TAIL_ARGS...] [--grep GREP_ARGS...] [--kubeconfig PATH]

    • View logs from model deployment or management pods.
    • --models: Show logs from model deployment pods.
    • --management: Show logs from management/monitoring pods (displays as table).
    • --pod POD_NAME: Filter to specific pod by name.
    • --tail TAIL_ARGS: Show last N lines from workspace logs (default: 10).
    • --grep GREP_ARGS: Filter logs by pattern (uses awk for pattern matching).
    • Defaults to showing both model and management logs if no filter specified.
  • lbh cluster remove <MODEL_NAME> [--all] [--kubeconfig PATH]

    • Remove specific model deployments from the cluster.
    • Deletes LLMBoostDeployment custom resources by name.
    • --all: Remove all model deployments (requires confirmation unless used with --force).
    • Example: lbh cluster remove facebook/opt-125m or lbh cluster remove --all
  • lbh cluster uninstall [--kubeconfig PATH] [--force]

    • Uninstall LLMBoost cluster resources.
    • Prompts for confirmation unless --force is used.
    • Does not automatically delete the namespace.

Next: Explore LLMBoost Features to see what you can build.