Learn how to use YBPU Model Compiler from model preparation to deployment
Train your PyTorch model and export as TorchScript
Use this website to compile for your target device
Get compiled library (libybpu_model.a + ybpu_model.h)
Single machine only. Cannot be deployed to multiple machines.
Use in your C++ project on the target device
Your model must be saved in TorchScript format for conversion:
import torch
# Your trained model
model = YourModel()
model.load_state_dict(torch.load('weights.pth'))
model.eval()
# Create example input (must match your model's expected input)
example_input = torch.randn(1, 3, 224, 224)
# Export as TorchScript using trace
traced_model = torch.jit.trace(model, example_input)
traced_model.save('model.pt')
print("Model saved successfully!")
torch.jit.trace() for models with fixed control flow,
or torch.jit.script() for models with dynamic control flow (if/else, loops).
ResNet, VGG, MobileNet, EfficientNet, ViT...
Output: [batch, num_classes]
YOLOv5, YOLOv8, YOLO11, SSD, Faster-RCNN...
Output: [batch, boxes, 5+classes]
UNet, DeepLab, FCN, Mask-RCNN...
Output: [batch, classes, H, W]
Choose your target: Linux x86_64 (PC/Server), Raspberry Pi 4/5 64-bit, or other ARM boards
Drag & drop or click to select your .pt or .onnx file
System automatically detects input shape, model type, and normalization
Get your compiled library with embedded model
Linux x86_64: If you select Linux x86_64 (PC/Server), the compilation server must have YBPU built for host and OpenCV installed. See Docs → Supported Platforms for build prerequisites.
After downloading, extract the package. You'll get 3 files:
ybpu_model_rpi4-64_fp32/ ├── libybpu_model.a # Static library (with embedded model) ├── ybpu_model.h # C++ header file └── model.lic # License file (required at runtime)
Copy these files to your project:
my_project/ ├── CMakeLists.txt ├── main.cpp ├── lib/ │ └── libybpu_model.a ├── include/ │ └── ybpu_model.h └── model.lic # Must be in executable directory!
cmake_minimum_required(VERSION 3.10)
project(my_inference_app)
set(CMAKE_CXX_STANDARD 11)
# Find OpenCV (required dependency)
find_package(OpenCV REQUIRED)
# Include header directories
include_directories(${CMAKE_SOURCE_DIR}/include)
include_directories(${OpenCV_INCLUDE_DIRS})
# Your application
add_executable(my_app main.cpp)
# Link with YBPU model library + OpenCV + pthread
target_link_libraries(my_app
${CMAKE_SOURCE_DIR}/lib/libybpu_model.a
${OpenCV_LIBS}
pthread
)
Important: You must put libybpu_model.a at the end of the link line. Otherwise you get undefined reference to YbpuModel::.... Two ways:
Method 1 — Use -L and -lybpu_model (library last):
g++ -O2 -fopenmp -o my_app main.cpp \
-I./include \
$(pkg-config --cflags --libs opencv4) \
-lpthread -lgomp \
-L./lib -lybpu_model
Method 2 — Use full path to the .a file (recommended, e.g. on Raspberry Pi):
g++ -O2 -fopenmp -o my_app main.cpp \
-I./include \
$(pkg-config --cflags --libs opencv4) \
-lpthread -lgomp \
./lib/libybpu_model.a
#include "ybpu_model.h"
#include <opencv2/opencv.hpp>
#include <iostream>
#include <vector>
#include <algorithm>
int main(int argc, char** argv) {
if (argc < 2) {
std::cerr << "Usage: " << argv[0] << " <image_path>" << std::endl;
return 1;
}
// ========================================
// Step 1: Create Model Instance
// ========================================
// Model is EMBEDDED in the library - no external files needed!
YbpuModel model;
if (!model.is_loaded()) {
std::cerr << "Failed to load model: " << model.get_last_error() << std::endl;
return 1;
}
// Print model info
std::cout << "Model loaded!" << std::endl;
std::cout << " Input: " << model.get_input_width() << "x"
<< model.get_input_height() << "x"
<< model.get_input_channels() << std::endl;
// ========================================
// Step 2: Load Image with OpenCV
// ========================================
cv::Mat image = cv::imread(argv[1]);
if (image.empty()) {
std::cerr << "Failed to load image" << std::endl;
return 1;
}
std::cout << "Image: " << image.cols << "x" << image.rows << std::endl;
// ========================================
// Step 3: Run Inference
// ========================================
// Preprocessing (resize, normalize) is AUTOMATIC!
std::vector<float> output = model.inference(image);
std::cout << "Output size: " << output.size() << std::endl;
// ========================================
// Step 4: Process Results
// ========================================
// For Classification: Find top prediction
if (output.size() > 0 && output.size() < 10000) {
auto max_it = std::max_element(output.begin(), output.end());
int class_id = std::distance(output.begin(), max_it);
float confidence = *max_it;
std::cout << "Predicted class: " << class_id << std::endl;
std::cout << "Confidence: " << confidence << std::endl;
}
// For Detection: Parse bounding boxes
// Output format: [x, y, w, h, confidence, class_scores...]
// For Segmentation: Output is pixel-wise class map
return 0;
}
# On your target device (e.g., Raspberry Pi)
# 1. Install OpenCV
sudo apt update
sudo apt install libopencv-dev
# 2. Build your project
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make
# 3. Copy license file to executable directory (IMPORTANT!)
cp ../model.lic .
# 4. Run inference
./my_app test_image.jpg
Every compiled model includes a time-limited license that you specify during compilation (1-365 days).
A SHA-signed JSON file containing expiration date and model verification data.
The library checks the license before loading the model. Expired license = model won't load.
Place model.lic in the same directory as your executable.
The library binds to the first machine it runs on. After the first run, it cannot be copied to another computer; if you try, you will see: This library is bound to another machine and cannot run on this device. To use the same model on another device, compile and download a separate package for that device.
#include "ybpu_model.h"
#include <iostream>
int main() {
// Check license BEFORE creating model
if (!YbpuModel::is_license_valid()) {
std::cerr << "License expired on: "
<< YbpuModel::get_license_expire_date() << std::endl;
std::cerr << "Please renew your license." << std::endl;
return 1;
}
// Show remaining days
int days_left = YbpuModel::get_license_days_remaining();
std::cout << "License valid for " << days_left << " more days" << std::endl;
// If license expires within 7 days, show warning
if (days_left <= 7) {
std::cout << "WARNING: License expiring soon!" << std::endl;
}
// Now safe to create model
YbpuModel model;
if (!model.is_loaded()) {
std::cerr << "Error: " << model.get_last_error() << std::endl;
return 1;
}
// ... rest of inference code
return 0;
}
When your license expires:
model.lic file (the library stays the same)When you first run your model, the library automatically benchmarks your hardware to find the optimal configuration. On x86 and other supported platforms, it automatically uses the best available processor for your machine.
On the very first inference, the library will:
~/.ybpu_cache/<model_hash>.cfgOn subsequent runs, the cached optimal configuration is loaded instantly - no benchmark delay.
set_num_threads()) is always used.
#include "ybpu_model.h"
#include <iostream>
int main() {
YbpuModel model;
// Check if auto-tune completed
if (model.is_auto_tuned()) {
std::cout << "Auto-tune complete!" << std::endl;
std::cout << "Config: " << model.get_auto_tune_config() << std::endl;
}
// Optional: Force re-benchmark (e.g., after hardware upgrade)
// model.force_auto_tune();
// Get cache file location
std::cout << "Cache at: " << YbpuModel::get_cache_path() << std::endl;
return 0;
}
To re-run, either call force_auto_tune() or delete the cache file at ~/.ybpu_cache/
| Method | Return Type | Description |
|---|---|---|
YbpuModel() |
- | Constructor - loads embedded model automatically |
is_loaded() |
bool |
Check if model loaded successfully |
get_input_width() |
int |
Get expected input image width |
get_input_height() |
int |
Get expected input image height |
get_input_channels() |
int |
Get expected input channels (usually 3) |
inference(cv::Mat) |
vector<float> |
Run inference on BGR image, returns raw output |
get_num_threads() |
int |
Get current thread count for inference |
set_num_threads(int) |
void |
Set thread count (1-16) for inference |
get_last_error() |
string |
Get last error message if any |
| Auto-Tune API | ||
is_auto_tuned() |
bool |
Check if auto-tune benchmark completed |
force_auto_tune() |
void |
Delete cache and re-run benchmark (5-10s) |
get_auto_tune_config() |
string |
Get current optimal config as string |
get_cache_path() |
string |
Get path to auto-tune cache file (static) |
| Static Method | Return Type | Description |
|---|---|---|
YbpuModel::is_license_valid() |
bool |
Returns true if license is valid and not expired |
YbpuModel::get_license_days_remaining() |
int |
Days until license expires (negative if expired) |
YbpuModel::get_license_expire_date() |
string |
Expiration date in YYYY-MM-DD format |
Solution: Ensure your model is saved as TorchScript:
traced = torch.jit.trace(model.eval(), example_input)
Solution: Install OpenCV and link correctly:
sudo apt install libopencv-dev
Solution: Build with Release mode:
cmake -DCMAKE_BUILD_TYPE=Release ..
Solution: Check if your training used different normalization. The compiler auto-detects common presets (ImageNet, YOLO, etc.)
Solution: Your model.lic has expired. Re-upload your model to the compiler and download a new package with fresh license.
Solution: Make sure model.lic is in the same directory as your executable:
cp /path/to/model.lic .
Solution: The library was first run on a different machine and cannot run on this one. Download a new package from the compiler for this device; do not copy the existing package from another computer.
Solution: Reduce thread count at runtime for lower power usage:
model.set_num_threads(2); // Use fewer threads
Solution: Use the set_num_threads() method to dynamically adjust based on conditions:
// Check current threads
int threads = model.get_num_threads();
// Increase for batch processing
model.set_num_threads(4);
// Decrease for power saving
model.set_num_threads(1);
This is normal! Auto-tune is benchmarking your hardware on first run. Subsequent inferences will be fast.
Cache location: ~/.ybpu_cache/<model_hash>.cfg
Solution: Delete the cache file or call force_auto_tune():
// Method 1: In code
model.force_auto_tune();
// Method 2: Delete cache file
rm ~/.ybpu_cache/*.cfg
Solution: Check your thread count setting. Auto-tune optimizes algorithms but doesn't change thread count. Try:
// Try increasing threads to match CPU cores
model.set_num_threads(4);
// View current auto-tune config
std::cout << model.get_auto_tune_config() << std::endl;