Documentation - Deep-ET YBPU Model Compiler

Overview

YBPU Model Compiler is a web-based service that converts PyTorch and ONNX models into optimized, deployable libraries for embedded devices and x86 PC/Server.

Embedded Model

Model data is compiled directly into the library. No external files needed at runtime.

Auto Configuration

Input shape, normalization parameters, and model type are automatically detected.

Cross-Compilation

Pre-compiled for ARM targets (Raspberry Pi 3, 4, 5, generic ARM) and Linux x86_64 (PC/Server).

Quick Start Guide

Follow these simple steps to compile your model:

1

Create an Account

Register with your email address and verify your account.

2

Select Target Platform

Choose your target device (e.g., Linux x86_64, Raspberry Pi 4 64-bit).

3

Choose Precision

Select FP32 for accuracy, FP16 for balance, or INT8 for speed.

4

Upload Your Model

Upload a TorchScript (.pt) or ONNX (.onnx) model file.

5

Download Package

Wait for compilation and download your ready-to-use library.

Single machine only. The downloaded library runs on one device only and cannot be used for multi-machine deployment.

Important: Your PyTorch model must be saved as TorchScript format:

import torch
model = YourModel()
model.eval()
traced = torch.jit.trace(model, torch.randn(1, 3, 224, 224))
traced.save("model.pt")

Supported Platforms

The compiler supports ARM-based embedded targets and Linux x86_64 (PC/Server):

Platform	Chip	Architecture	Recommended For
Linux x86_64 (PC/Server)	Intel/AMD 64-bit	x86_64	Desktop, server, x86 dev machines
Raspberry Pi 5 (64-bit)	Cortex-A76	ARMv8.2-A	Best performance
Raspberry Pi 4 (64-bit)	Cortex-A72	ARMv8-A	Most popular choice
Raspberry Pi 4 (32-bit)	Cortex-A72	ARMv7-A	Legacy 32-bit OS
Raspberry Pi 3 (64-bit)	Cortex-A53	ARMv8-A	Older devices
Raspberry Pi 3 (32-bit)	Cortex-A53	ARMv7-A	Legacy support
Generic ARM64 Linux	ARMv8	ARMv8-A	Other 64-bit ARM boards
Generic ARM32 Linux	ARMv7	ARMv7-A	Other 32-bit ARM boards

Linux x86_64 build prerequisites (server)

To compile for Linux x86_64, the server must have YBPU built for the host and OpenCV installed. Build YBPU once:

cd ybpu
mkdir -p build-host-gcc-linux && cd build-host-gcc-linux
cmake -DCMAKE_TOOLCHAIN_FILE=../toolchains/host.gcc.toolchain.cmake ..
make -j4

Install OpenCV:

sudo apt install libopencv-dev

Or ensure pkg-config opencv4 (or opencv) works.

Supported Models

Compatible with Most Open-Source Models

Deep-ET YBPU Compiler supports most mainstream open-source neural network models. Simply upload your PyTorch or ONNX model, and our intelligent system will automatically handle preprocessing, conversion, and optimization.

Accepted File Formats

.pt PyTorch .pth PyTorch .onnx ONNX

10 Model Categories

We provide optimized support for the following categories with automatic preprocessing and postprocessing:

1

Image Classification

e.g., YOLO11-cls, ResNet, MobileNet...

Image → Class labels

2

Object Detection

e.g., YOLO11, YOLOv8, YOLOX, SSD...

Image → Bounding boxes

3

Instance Segmentation

e.g., YOLO11-seg, YOLOv8-seg...

Image → Masks

4

Rotated Detection (OBB)

e.g., YOLO11-obb, YOLOv8-obb...

Image → Rotated boxes

5

Pose Estimation

e.g., YOLO11-pose, YOLOv8-pose...

Image → Keypoints

6

Face Detection

e.g., SCRFD, RetinaFace, ArcFace...

Image → Faces

7

Crowd Counting

e.g., P2PNet...

Image → Count

8

Video Matting

e.g., RVM...

Image → Alpha

9

OCR

e.g., PaddleOCR...

Image → Text

10

Speech Recognition & Synthesis

e.g., Whisper (ASR), Piper (TTS)...

Audio ↔ Text

Smart Model Recognition

Our compiler automatically detects your model architecture and applies optimal settings. For custom or unknown models, AI-assisted analysis ensures proper configuration.

File Size Limit: Maximum upload size is 500 MB.

Input Formats

The compiled library automatically adapts to different input formats. Just pass your data - preprocessing is handled automatically!

Image Input

For vision models - just pass OpenCV Mat

cv::Mat image = cv::imread("photo.jpg");
auto results = model.detect(image);

Audio Input

For speech recognition models

std::vector<float> audio = load_wav("speech.wav");
auto result = model.transcribe(audio);

Text Input

For TTS models

auto result = model.synthesize("Hello!");
// result.audio contains waveform

Automatic Preprocessing: The library handles all preprocessing internally - color conversion, normalization, resizing, padding, and more. Just pass your raw data!

Precision Options

Choose the right precision level for your use case:

FP32

Single Precision Float

Highest accuracy
Largest file size
Best for development/testing

Accuracy: 100%

Size: 100%

Balanced

FP16

Half Precision Float

Good accuracy
50% smaller file size
Faster inference on some devices

Accuracy: ~95%

Size: 50%

INT8

8-bit Integer

Reduced accuracy
75% smaller file size
Fastest inference

Accuracy: ~85%

Size: 25%

Thread Count

Choose the number of CPU threads for inference based on your target device:

4

Raspberry Pi 4/5

Use 4 threads for best performance

2-4

Other ARM Devices

Match your CPU core count

Tip: More threads = faster inference but higher power usage. For battery devices, use fewer threads.

Auto-Tune

The library automatically optimizes for your hardware on first run - no configuration needed!

1

First Run

Benchmarks ~100 configurations (5-10 seconds)

→

2

Cache

Saves optimal settings automatically

→

3

Run Fast

Uses cached settings instantly

Zero Configuration: Just run your model - the library handles all optimizations automatically!

On supported platforms (including Linux x86_64), the library automatically selects the best compute backend for your machine on first run. No manual configuration required.

Automatic Analysis

The compiler intelligently analyzes your model and handles all configuration automatically:

Input Shape

Model Type

Normalization

Preprocessing

Postprocessing

Just Upload: No manual configuration needed - our AI-powered analysis handles everything automatically!

Output Package Structure

After compilation, you'll receive a .tar.gz package containing 3 files:

ybpu_model_rpi4-64_fp32/
├── libybpu_model.a    # Static library (with embedded model)
├── ybpu_model.h       # C++ header file
└── model.lic          # License file (SHA-signed)

Key Features

Embedded Model: Model weights are compiled into the library - no external files needed
License Protection: Time-limited license based on your selected duration (1-365 days)
Simple Integration: Just link the .a file, include the .h header, and keep the .lic file

How to Use in Your Project

Project Structure

my_project/
├── CMakeLists.txt
├── main.cpp
├── lib/
│   └── libybpu_model.a    # Copy from downloaded package
├── include/
│   └── ybpu_model.h       # Copy from downloaded package
└── model.lic              # Copy to executable directory

CMakeLists.txt

cmake_minimum_required(VERSION 3.10)
project(my_inference_app)

set(CMAKE_CXX_STANDARD 11)

# Find OpenCV (required)
find_package(OpenCV REQUIRED)

# Include directories
include_directories(${CMAKE_SOURCE_DIR}/include)
include_directories(${OpenCV_INCLUDE_DIRS})

# Your application
add_executable(my_app main.cpp)

# Link with YBPU model library and OpenCV
target_link_libraries(my_app 
    ${CMAKE_SOURCE_DIR}/lib/libybpu_model.a
    ${OpenCV_LIBS}
    pthread
)

Build Commands

# On your target device (e.g., Raspberry Pi)

# 1. Install OpenCV if not already installed
sudo apt update
sudo apt install libopencv-dev

# 2. Create build directory and compile
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make

# 3. Copy license file to executable directory
cp ../model.lic .

# 4. Run your application
./my_app test_image.jpg

Direct g++ Compilation

Put libybpu_model.a at the end of the command to avoid undefined reference to YbpuModel::....

Method 1 — -L and -lybpu_model (library last):

g++ -O2 -o my_app main.cpp \
    -I./include \
    $(pkg-config --cflags --libs opencv4) \
    -lpthread \
    -L./lib -lybpu_model

Method 2 — Full path to .a file (recommended on Raspberry Pi):

g++ -O2 -o my_app main.cpp \
    -I./include \
    $(pkg-config --cflags --libs opencv4) \
    -lpthread \
    ./lib/libybpu_model.a

Copy model.lic to the same directory as the executable before running.

License System

The compiled library includes a time-limited license protection system.

How It Works

When you compile a model, you specify a license duration (1-365 days)
The compiler generates a SHA-signed license file (model.lic)
The library checks the license at runtime before loading the model
If the license has expired, the model will not load

Machine Binding (No Copy to Other Machines)

The library is bound to the first machine it runs on. On first run, it generates a binding (using the machine's MAC address and a SHA key) and stores it in a hidden file next to your executable. If you copy the application (including the library) to another machine, the library will not run there and will report an error.

First run: Binding is created automatically; no action needed.
Same machine: Runs normally on subsequent launches.
Different machine: Loading fails with: This library is bound to another machine and cannot run on this device. Use a new download for the other machine.

License File (model.lic)

The license file contains:

{
  "model_name": "your_model",
  "model_hash": "sha256...",
  "created_at": "2026-02-28",
  "expire_at": "2026-03-30",
  "valid_days": 30,
  "license_id": "ybpu-xxxx-xxxx",
  "signature": "sha256..."
}

Using License in C++

#include "ybpu_model.h"
#include <iostream>

int main() {
    // Check license status before loading
    if (!YbpuModel::is_license_valid()) {
        std::cerr << "License expired on: " 
                  << YbpuModel::get_license_expire_date() << std::endl;
        return 1;
    }
    
    std::cout << "License valid for " 
              << YbpuModel::get_license_days_remaining() 
              << " more days" << std::endl;
    
    // Create model (will fail if license expired)
    YbpuModel model;
    
    if (!model.is_loaded()) {
        // Error message will indicate license issue if applicable
        std::cerr << "Error: " << model.get_last_error() << std::endl;
        return 1;
    }
    
    // ... rest of your code
    return 0;
}

License API Reference

Static Method	Description
`YbpuModel::is_license_valid()`	Returns `true` if license is still valid
`YbpuModel::get_license_days_remaining()`	Returns number of days until expiration
`YbpuModel::get_license_expire_date()`	Returns expiration date string (YYYY-MM-DD)

Important: The model.lic file must be in the same directory as your executable, or specify the path using YBPU_LICENSE_PATH environment variable.

License Renewal

To renew an expired license:

Re-upload your model to the compiler
Select your desired license duration
Download the new package with fresh model.lic
Replace only the model.lic file (library remains the same)

Contact help@deep-et.com for enterprise licensing options.

C++ API Usage

Basic Usage

#include "ybpu_model.h"
#include <opencv2/opencv.hpp>

int main() {
    // Create model instance - model is already embedded!
    YbpuModel model;
    
    if (!model.is_loaded()) {
        std::cerr << "Failed to load model: " << model.get_last_error() << std::endl;
        return 1;
    }
    
    // Read image with OpenCV (BGR format)
    cv::Mat image = cv::imread("test.jpg");
    
    // Run inference - preprocessing is automatic!
    std::vector<float> output = model.inference(image);
    
    // Process output based on model type
    // Classification: find max probability
    auto max_it = std::max_element(output.begin(), output.end());
    int class_id = std::distance(output.begin(), max_it);
    float confidence = *max_it;
    
    std::cout << "Predicted class: " << class_id << std::endl;
    std::cout << "Confidence: " << confidence << std::endl;
    
    return 0;
}

API Reference

Method	Description
`YbpuModel()`	Constructor - loads the embedded model automatically
`YbpuModel(param, bin)`	Load external model files (optional)
`bool is_loaded()`	Check if model loaded successfully
`int get_input_width()`	Get expected input width
`int get_input_height()`	Get expected input height
`int get_input_channels()`	Get expected input channels
`vector<float> inference(cv::Mat)`	Run inference on an image
`int get_num_threads()`	Get current thread count for inference
`void set_num_threads(int)`	Set thread count for inference (1-16)
`string get_last_error()`	Get last error message

Building Your Project

# On your target device (e.g., Raspberry Pi)

# 1. Extract the package
tar -xzf ybpu_model_rpi4-64_fp32.tar.gz
cd ybpu_model_rpi4-64_fp32

# 2. Set up your project with the lib and header files
mkdir -p my_project/{lib,include}
cp libybpu_model.a my_project/lib/
cp ybpu_model.h my_project/include/
cp model.lic my_project/

# 3. Build with CMake (see CMakeLists.txt example above)
cd my_project
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make

# 4. Don't forget the license file!
cp ../model.lic .

# 5. Run your application
./my_app test_image.jpg

CMake Integration

cmake_minimum_required(VERSION 3.10)
project(my_app)

set(CMAKE_CXX_STANDARD 11)
find_package(OpenCV REQUIRED)

# Add YBPU model library
add_subdirectory(path/to/ybpu_model_package ybpu_model)

add_executable(my_app main.cpp)
target_link_libraries(my_app ybpu_model ${OpenCV_LIBS} OpenMP::OpenMP_CXX pthread)

Frequently Asked Questions

Q: Do I need to specify model files when using the library?

A: No! The model is embedded in the library. Just use YbpuModel model; - no file paths needed.

Q: What image format should I use?

A: Use OpenCV's default BGR format. The library handles all preprocessing (resize, color conversion, normalization) automatically.

Q: How do I save my PyTorch model correctly?

A: Use TorchScript tracing:

model.eval()
traced = torch.jit.trace(model, example_input)
traced.save("model.pt")

Q: Build fails with "OpenCV not found"

A: Install OpenCV on your target device:

sudo apt install libopencv-dev

Q: Can I use custom normalization?

A: Yes, you can override normalization with:

model.set_normalize({mean_b, mean_g, mean_r}, {scale_b, scale_g, scale_r});

Q: Why is inference slow?

A: Make sure to build with optimizations:

cmake -DCMAKE_BUILD_TYPE=Release ..

Q: What's the maximum model size?

A: Maximum upload size is 500 MB. For larger models, consider using FP16 or INT8 quantization.