Usage Guide - Deep-ET YBPU Model Compiler

End-to-End Workflow

1

Prepare Model

Train your PyTorch model and export as TorchScript

2

Upload & Compile

Use this website to compile for your target device

3

Download Package

Get compiled library (libybpu_model.a + ybpu_model.h)

Single machine only. Cannot be deployed to multiple machines.

4

Integrate & Deploy

Use in your C++ project on the target device

1 Model Preparation

Saving PyTorch Model as TorchScript

Your model must be saved in TorchScript format for conversion:

Python - Export Model

import torch

# Your trained model
model = YourModel()
model.load_state_dict(torch.load('weights.pth'))
model.eval()

# Create example input (must match your model's expected input)
example_input = torch.randn(1, 3, 224, 224)

# Export as TorchScript using trace
traced_model = torch.jit.trace(model, example_input)
traced_model.save('model.pt')

print("Model saved successfully!")

Tip: Use torch.jit.trace() for models with fixed control flow, or torch.jit.script() for models with dynamic control flow (if/else, loops).

Supported Model Types

Classification

ResNet, VGG, MobileNet, EfficientNet, ViT...

Output: [batch, num_classes]

Detection

YOLOv5, YOLOv8, YOLO11, SSD, Faster-RCNN...

Output: [batch, boxes, 5+classes]

Segmentation

UNet, DeepLab, FCN, Mask-RCNN...

Output: [batch, classes, H, W]

2 Upload & Compile

Compilation Process

Select Target Platform
Choose your target: Linux x86_64 (PC/Server), Raspberry Pi 4/5 64-bit, or other ARM boards
Select Precision
- FP32: Highest accuracy, largest size
- FP16: Good balance of accuracy and size
- INT8: Smallest size, fastest, slight accuracy loss
Set License Duration & Thread Count
- License Duration: How many days the library will be valid (1-365)
- Thread Count: Number of CPU threads for inference (1-16, recommended: 4)
Upload Model
Drag & drop or click to select your .pt or .onnx file
Auto Analysis
System automatically detects input shape, model type, and normalization
Download Package
Get your compiled library with embedded model

Linux x86_64: If you select Linux x86_64 (PC/Server), the compilation server must have YBPU built for host and OpenCV installed. See Docs → Supported Platforms for build prerequisites.

3 C++ Integration

Project Setup

After downloading, extract the package. You'll get 3 files:

ybpu_model_rpi4-64_fp32/
├── libybpu_model.a    # Static library (with embedded model)
├── ybpu_model.h       # C++ header file
└── model.lic          # License file (required at runtime)

Copy these files to your project:

my_project/
├── CMakeLists.txt
├── main.cpp
├── lib/
│   └── libybpu_model.a
├── include/
│   └── ybpu_model.h
└── model.lic          # Must be in executable directory!

CMakeLists.txt

cmake_minimum_required(VERSION 3.10)
project(my_inference_app)

set(CMAKE_CXX_STANDARD 11)

# Find OpenCV (required dependency)
find_package(OpenCV REQUIRED)

# Include header directories
include_directories(${CMAKE_SOURCE_DIR}/include)
include_directories(${OpenCV_INCLUDE_DIRS})

# Your application
add_executable(my_app main.cpp)

# Link with YBPU model library + OpenCV + pthread
target_link_libraries(my_app 
    ${CMAKE_SOURCE_DIR}/lib/libybpu_model.a
    ${OpenCV_LIBS}
    pthread
)

Alternative: Direct g++ Compilation

Important: You must put libybpu_model.a at the end of the link line. Otherwise you get undefined reference to YbpuModel::.... Two ways:

Method 1 — Use -L and -lybpu_model (library last):

g++ -O2 -fopenmp -o my_app main.cpp \
    -I./include \
    $(pkg-config --cflags --libs opencv4) \
    -lpthread -lgomp \
    -L./lib -lybpu_model

Method 2 — Use full path to the .a file (recommended, e.g. on Raspberry Pi):

g++ -O2 -fopenmp -o my_app main.cpp \
    -I./include \
    $(pkg-config --cflags --libs opencv4) \
    -lpthread -lgomp \
    ./lib/libybpu_model.a

Complete Inference Example

main.cpp - Full Example

#include "ybpu_model.h"
#include <opencv2/opencv.hpp>
#include <iostream>
#include <vector>
#include <algorithm>

int main(int argc, char** argv) {
    if (argc < 2) {
        std::cerr << "Usage: " << argv[0] << " <image_path>" << std::endl;
        return 1;
    }

    // ========================================
    // Step 1: Create Model Instance
    // ========================================
    // Model is EMBEDDED in the library - no external files needed!
    YbpuModel model;
    
    if (!model.is_loaded()) {
        std::cerr << "Failed to load model: " << model.get_last_error() << std::endl;
        return 1;
    }
    
    // Print model info
    std::cout << "Model loaded!" << std::endl;
    std::cout << "  Input: " << model.get_input_width() << "x" 
              << model.get_input_height() << "x"
              << model.get_input_channels() << std::endl;

    // ========================================
    // Step 2: Load Image with OpenCV
    // ========================================
    cv::Mat image = cv::imread(argv[1]);
    if (image.empty()) {
        std::cerr << "Failed to load image" << std::endl;
        return 1;
    }
    
    std::cout << "Image: " << image.cols << "x" << image.rows << std::endl;

    // ========================================
    // Step 3: Run Inference
    // ========================================
    // Preprocessing (resize, normalize) is AUTOMATIC!
    std::vector<float> output = model.inference(image);
    
    std::cout << "Output size: " << output.size() << std::endl;

    // ========================================
    // Step 4: Process Results
    // ========================================
    
    // For Classification: Find top prediction
    if (output.size() > 0 && output.size() < 10000) {
        auto max_it = std::max_element(output.begin(), output.end());
        int class_id = std::distance(output.begin(), max_it);
        float confidence = *max_it;
        
        std::cout << "Predicted class: " << class_id << std::endl;
        std::cout << "Confidence: " << confidence << std::endl;
    }
    
    // For Detection: Parse bounding boxes
    // Output format: [x, y, w, h, confidence, class_scores...]
    
    // For Segmentation: Output is pixel-wise class map
    
    return 0;
}

Build & Run

Terminal Commands

# On your target device (e.g., Raspberry Pi)

# 1. Install OpenCV
sudo apt update
sudo apt install libopencv-dev

# 2. Build your project
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make

# 3. Copy license file to executable directory (IMPORTANT!)
cp ../model.lic .

# 4. Run inference
./my_app test_image.jpg

4 License System

Understanding the License

Every compiled model includes a time-limited license that you specify during compilation (1-365 days).

License File (model.lic)

A SHA-signed JSON file containing expiration date and model verification data.

Runtime Verification

The library checks the license before loading the model. Expired license = model won't load.

Location

Place model.lic in the same directory as your executable.

Machine Binding

The library binds to the first machine it runs on. After the first run, it cannot be copied to another computer; if you try, you will see: This library is bound to another machine and cannot run on this device. To use the same model on another device, compile and download a separate package for that device.

Checking License Status in Code

#include "ybpu_model.h"
#include <iostream>

int main() {
    // Check license BEFORE creating model
    if (!YbpuModel::is_license_valid()) {
        std::cerr << "License expired on: " 
                  << YbpuModel::get_license_expire_date() << std::endl;
        std::cerr << "Please renew your license." << std::endl;
        return 1;
    }
    
    // Show remaining days
    int days_left = YbpuModel::get_license_days_remaining();
    std::cout << "License valid for " << days_left << " more days" << std::endl;
    
    // If license expires within 7 days, show warning
    if (days_left <= 7) {
        std::cout << "WARNING: License expiring soon!" << std::endl;
    }
    
    // Now safe to create model
    YbpuModel model;
    
    if (!model.is_loaded()) {
        std::cerr << "Error: " << model.get_last_error() << std::endl;
        return 1;
    }
    
    // ... rest of inference code
    return 0;
}

License Renewal

When your license expires:

Re-upload your model to the compiler website
Select your desired new license duration
Download the new package
Replace only the model.lic file (the library stays the same)

Note: For enterprise licensing with longer durations or volume licenses, contact help@deep-et.com

4.5 Auto-Tune Feature

Automatic Performance Optimization

When you first run your model, the library automatically benchmarks your hardware to find the optimal configuration. On x86 and other supported platforms, it automatically uses the best available processor for your machine.

First Run Behavior

On the very first inference, the library will:

Run ~100 benchmark iterations with different optimization settings
Test combinations of Winograd, SGEMM, packing, BF16, and other options
Save the best configuration to ~/.ybpu_cache/<model_hash>.cfg
This takes about 5-10 seconds (one-time only)

Subsequent Runs

On subsequent runs, the cached optimal configuration is loaded instantly - no benchmark delay.

Note: Thread count is NOT part of auto-tune. The thread count you specified during compilation (or set via set_num_threads()) is always used.

Using Auto-Tune API

#include "ybpu_model.h"
#include <iostream>

int main() {
    YbpuModel model;
    
    // Check if auto-tune completed
    if (model.is_auto_tuned()) {
        std::cout << "Auto-tune complete!" << std::endl;
        std::cout << "Config: " << model.get_auto_tune_config() << std::endl;
    }
    
    // Optional: Force re-benchmark (e.g., after hardware upgrade)
    // model.force_auto_tune();
    
    // Get cache file location
    std::cout << "Cache at: " << YbpuModel::get_cache_path() << std::endl;
    
    return 0;
}

When to Re-run Auto-Tune

Hardware change: After upgrading CPU or changing device
OS update: Major kernel updates may affect performance
Different target: When deploying to a new device type

To re-run, either call force_auto_tune() or delete the cache file at ~/.ybpu_cache/

5 API Reference

YbpuModel Class

Method	Return Type	Description
`YbpuModel()`	-	Constructor - loads embedded model automatically
`is_loaded()`	`bool`	Check if model loaded successfully
`get_input_width()`	`int`	Get expected input image width
`get_input_height()`	`int`	Get expected input image height
`get_input_channels()`	`int`	Get expected input channels (usually 3)
`inference(cv::Mat)`	`vector<float>`	Run inference on BGR image, returns raw output
`get_num_threads()`	`int`	Get current thread count for inference
`set_num_threads(int)`	`void`	Set thread count (1-16) for inference
`get_last_error()`	`string`	Get last error message if any
Auto-Tune API
`is_auto_tuned()`	`bool`	Check if auto-tune benchmark completed
`force_auto_tune()`	`void`	Delete cache and re-run benchmark (5-10s)
`get_auto_tune_config()`	`string`	Get current optimal config as string
`get_cache_path()`	`string`	Get path to auto-tune cache file (static)

Input Requirements

Format: OpenCV cv::Mat (BGR, 8-bit unsigned)
Size: Any size - auto-resized to model input
Preprocessing: Automatic (resize, normalize)

License Static Methods

Static Method	Return Type	Description
`YbpuModel::is_license_valid()`	`bool`	Returns true if license is valid and not expired
`YbpuModel::get_license_days_remaining()`	`int`	Days until license expires (negative if expired)
`YbpuModel::get_license_expire_date()`	`string`	Expiration date in YYYY-MM-DD format

6 Troubleshooting

Model conversion fails

Solution: Ensure your model is saved as TorchScript:

traced = torch.jit.trace(model.eval(), example_input)

Undefined reference to cv::imread

Solution: Install OpenCV and link correctly:

sudo apt install libopencv-dev

Inference is slow

Solution: Build with Release mode:

cmake -DCMAKE_BUILD_TYPE=Release ..

Wrong output values

Solution: Check if your training used different normalization. The compiler auto-detects common presets (ImageNet, YOLO, etc.)

License expired error

Solution: Your model.lic has expired. Re-upload your model to the compiler and download a new package with fresh license.

License file not found

Solution: Make sure model.lic is in the same directory as your executable:

cp /path/to/model.lic .

This library is bound to another machine

Solution: The library was first run on a different machine and cannot run on this one. Download a new package from the compiler for this device; do not copy the existing package from another computer.

Device overheating or high power consumption

Solution: Reduce thread count at runtime for lower power usage:

model.set_num_threads(2); // Use fewer threads

Want to adjust thread count at runtime

Solution: Use the set_num_threads() method to dynamically adjust based on conditions:

// Check current threads
int threads = model.get_num_threads();

// Increase for batch processing
model.set_num_threads(4);

// Decrease for power saving
model.set_num_threads(1);

First inference is slow (5-10 seconds)

This is normal! Auto-tune is benchmarking your hardware on first run. Subsequent inferences will be fast.

Cache location: ~/.ybpu_cache/<model_hash>.cfg

Want to re-run auto-tune benchmark

Solution: Delete the cache file or call force_auto_tune():

// Method 1: In code
model.force_auto_tune();

// Method 2: Delete cache file
rm ~/.ybpu_cache/*.cfg

Performance is still slow after auto-tune

Solution: Check your thread count setting. Auto-tune optimizes algorithms but doesn't change thread count. Try:

// Try increasing threads to match CPU cores
model.set_num_threads(4);

// View current auto-tune config
std::cout << model.get_auto_tune_config() << std::endl;

Complete Usage Guide