NVBit Tutorial Tools
This directory contains example instrumentation tools built with NVBit (NVIDIA Binary Instrumentation Tool). Each tool demonstrates different capabilities and serves as a starting point for developing your own instrument
This directory contains example instrumentation tools built with NVBit (NVIDIA Binary Instrumentation Tool). Each tool demonstrates different capabilities and serves as a starting point for developing your own instrumentation tools.
Overview of Tools
- instr_count - Basic instruction counting tool
- opcode_hist - Histogram of executed instructions by opcode
- record_reg_vals - Records register values during execution
- instr_count_bb - Instruction counting at basic block level
- instr_count_cuda_graph - Instruction counting for CUDA graph execution
- mem_printf2 - Memory access printing tool
- mem_trace - Memory trace collection tool
- mov_replace - MOV instruction replacement example
Common Structure
Each tool typically consists of host-side code (.cu file with the same name as the tool) that sets up instrumentation, manages callbacks, and processes results; device-side code (usually inject_funcs.cu) that contains functions injected into the GPU code; a Makefile that compiles the tool into a shared library (.so); and a README.md with documentation for the specific tool.
Running the Tools
Or alternatively:
CUDA_INJECTION64_PATH=/path/to/tool.so ./your_cuda_applicationMost tools support configuration through environment variables. Check each tool's README for specific options.
Test Application
The repository includes a simple vector addition test application in /test-apps/vectoradd/ that can be used to test these tools:
# Compile the test application
cd ../test-apps/vectoradd
make
# Run with a tool
LD_PRELOAD=../../tools/instr_count/instr_count.so ./vectoraddTool-Specific Information
-
instr_count counts dynamic instructions executed by each kernel by instrumenting every instruction and incrementing a counter for each execution.
-
opcode_hist builds a histogram of instruction opcodes, showing which types of instructions are most common in your application.
-
record_reg_vals records register values during kernel execution, allowing analysis of data flow.
-
instr_count_bb is similar to instr_count but instruments at basic block level for better performance.
-
instr_count_cuda_graph extends instruction counting to handle CUDA graphs (a more advanced CUDA feature).
-
mem_printf2 prints memory access information during execution.
-
mem_trace captures detailed memory access traces for analysis, using a channel-based approach to send data from GPU to CPU.
-
mov_replace demonstrates how to replace specific instructions (MOV in this case) with custom implementations.
Developing Your Own Tools
To create a new tool:
- Copy an existing tool directory that's closest to your needs
- Modify the host and device code
- Update the Makefile if needed
- Run
maketo build your tool
The key NVBit APIs are documented in ../core/nvbit.h.
For more detailed information on each tool, refer to their individual README files and source code.
Continue exploring
Back to index
Ecosystem & Other Projects
Explore the eunomia-bpf ecosystem with additional tools for eBPF benchmarking, AI monitoring agents, and compatibility libraries.
Previous
NVBit Tutorial: Opcode Histogram
Github repo: <https://github.com/eunomia-bpf/nvbit-tutorial
Next
NVBit Tutorial: Register Value Recording
Github repo: <https://github.com/eunomia-bpf/nvbit-tutorial
- Last updated
- Oct 16, 2025
- First published
- Oct 16, 2025
- Contributors
- yunwei37
Was this page helpful?