Skip to main content

Nvidia GPU metrics

Show hardware metrics for Nvidia GPUs

Metric nameTypeDescription
DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWScounterNumber of remapped rows for correctable errors
DCGM_FI_DEV_DEC_UTILgaugeDecoder utilization (in %).
DCGM_FI_DEV_ENC_UTILgaugeEncoder utilization (in %).
DCGM_FI_DEV_FB_FREEgaugeFramebuffer memory free (in MiB).
DCGM_FI_DEV_FB_USEDgaugeFramebuffer memory used (in MiB).
DCGM_FI_DEV_GPU_TEMPgaugeGPU temperature (in C).
DCGM_FI_DEV_GPU_UTILgaugeGPU utilization (in %).
DCGM_FI_DEV_MEM_CLOCKgaugeMemory clock frequency (in MHz).
DCGM_FI_DEV_MEM_COPY_UTILgaugeMemory transfer utilization (in %).
DCGM_FI_DEV_MEMORY_TEMPgaugeMemory temperature (in C).
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTALcounterTotal number of NVLink bandwidth counters for all lanes.
DCGM_FI_DEV_PCIE_REPLAY_COUNTERcounterTotal number of PCIe retries.
DCGM_FI_DEV_POWER_USAGEgaugePower draw (in W).
DCGM_FI_DEV_ROW_REMAP_FAILUREgaugeWhether remapping of rows has failed
DCGM_FI_DEV_SM_CLOCKgaugeSM clock frequency (in MHz).
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTIONcounterTotal energy consumption since boot (in mJ).
DCGM_FI_DEV_UNCORRECTABLE_REMAPPED_ROWScounterNumber of remapped rows for uncorrectable errors
DCGM_FI_DEV_VGPU_LICENSE_STATUSgaugevGPU License status
DCGM_FI_DEV_XID_ERRORSgaugeValue of the last XID error encountered.
DCGM_FI_PROF_DRAM_ACTIVEgaugeRatio of cycles the device memory interface is active sending or receiving data.
DCGM_FI_PROF_GR_ENGINE_ACTIVEgaugeRatio of time the graphics engine is active.
DCGM_FI_PROF_PCIE_RX_BYTEScounterThe number of bytes of active pcie rx data including both header and payload.
DCGM_FI_PROF_PCIE_TX_BYTEScounterThe number of bytes of active pcie tx data including both header and payload.
DCGM_FI_PROF_PIPE_TENSOR_ACTIVEgaugeRatio of cycles the tensor (HMMA) pipe is active.