Snapdragon Neural Processing Engine SDK Reference Guide (6)

- 4.4.5 Compile UDO package

4.4.5 Compile UDO package

Introduction

This section provides information on compiling UDO packages for all supported runtimes in SNPE.

As mentioned in UDO overview, a collection of registration and implementation libraries is collectively referred to as a UDO package. Users have full control over building these libraries for their desired runtime with a compatible toolchain. Alternatively, the SNPE SDK provides tools and utilities to easily create and compile UDOs. For more information on tools for creating UDO packages, see Creating UDO Packages. This section describes UDO package compilation based on the directory structure provided by the package generator.

Implement user-defined operations

Fundamentally, a UDO needs to be developed using the set of APIs defined in the header files located in $SNPE_ROOT/share/SnpeUdo/include. Each runtime may impose additional requirements and provide options for customizing the implementation to suit the runtime. Details of the UDO API can be found in the API documentation for the C++ API.
This section assumes that the UDO package was generated using the UDO Package Generator tool described in Creating UDO Packages, which generates a partial implementation skeleton from the user-configured UDO specification.

Set targets for package compilation

The UDO package generator tool creates a makefile to compile a package for a specific combination of runtime and target platform. makefiles are intended to provide an easy interface to compile for platforms that use make natively or that require ndk-build. Using the provided makefile also allows compilation of each library for various targets.

The general form of each make target is _. Only form targets include all possible targets. For example, run

make cpu

CPU will be compiled for x86 and Android platforms. Also, for applicable platforms, the PLATFORM make variable can be used to select a specific platform ABI, similar to APP_ABI in ndk-build. By default, PLATFORM is set to arm64-v8a and armeabi-v7a. A comprehensive table of available targets is listed below.

NOTE: The use of makefiles is optional, making libraries is not required.

Note: For all examples below, the artifacts shown are for the arm64-v8a target.

Implementing UDOs for the CPU

A CPU UDO implementation library based on the core UDO API is required to run the UDO package on CPU runtime. The UDO package generator tool will create a skeleton with blank constructs in the required format, but the core logic for creating and executing operations needs to be filled in by the user. This can be done by completing the implementation of the finalize(), execute() and free() functions in the .cpp file generated by the UDO package generator tool.

Note: An important concept to consider is that SNPE provides tensor data corresponding to all inputs and outputs of a UDO not directly in the SnpeUdo_TensorParam_t defined inside tensorData, but as opaque pointers. UDO implementations are expected to use methods in the CustomOp operation object published by SNPE at execution time to obtain a handle to the raw tensor pointer. See SnpeUdo_CpuInfrastructure_t for more details on the data structure. The CPU runtime operates only with floating-point activation tensors. Therefore, CPU UDO implementations should be implemented to only receive and produce floating point tensors, with the field data_type in the configuration file set to FLOAT_32. All other data types are ignored. See Defining UDOs for details.

The SNPE model quantization tool requires the UDO package, snpe-dlc-quantize, to be compiled and run on the host. Model quantization with snpe-dlc-quantize is necessary to run UDO layers with at least one non-floating point input on the DSP.

Compile UDOs for the CPU on the host

The steps to compile the CPU UDO implementation library on the host x86 platform are as follows:

Set the environment variable $SNPE_UDO_ROOT.

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>

Run the following make command to compile the UDO package:

make cpu_x86

The expected artifact after compiling for the host CPU is

UDO CPU implementation library: /libs/x86-64_linux_clang/libUdoImplCpu.so
UDO package registration library: /libs/x86-64_linux_clang/libUdoReg.so

NOTE: This command must be run from the package root.

Compile UDOs for the CPU on the device

The steps to compile the CPU UDO implementation library on the Android platform are as follows:

Set the environment variable $SNPE_UDO_ROOT.

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>

$NDK_BUILD must be set for the Android NDK build toolchain.

export NDK_BUILD=<absolute_path_to_android_ndk_directory>

Run the following make command to compile the UDO package:

make cpu_android

NDK builds require the shared C++ standard library to run. Make sure libc++_shared.so exists on the device LD_LIBRARY_PATH.

The expected artifact after compiling for Android CPU is

UDO CPU implementation library: /libs/arm64-v8a/libUdoImplCpu.so
UDO package registration library: /libs/arm64-v8a/libUdoReg.so
A copy of the shared standard C++ library: /libs/arm64-v8a/libc++_shared.so

Implementing UDOs for the GPU

Similar to the CPU runtime, a GPU UDO implementation library based on the core UDO API is required to run UDO packages on the GPU runtime. The UDO package generator tool will create a skeleton with blank constructs in the required format, but the core logic for creating and executing operations needs to be filled in by the user. This can be done by completing the implementation of setKernelInfo() and functions and adding the GPU kernel implementation in the .cppOperation() file generated by the UDO package generator tool.

SNPE GPU UDO supports 16-bit floating point activations in the network. Users should expect input/output OpenCL buffer memories from SNPE GPU UDOs to be in 16-bit floating point (or OpenCL half) data format as storage type. For increased precision, users can choose to implement the kernel’s internal math operations using 32-bit floating-point data and convert to half-precision when reading from the UDO core’s input buffer or writing to the output buffer.

SNPE GPU allows users to optionally cache OpenCL programs associated with multiple GPU UDO instances of similar type. It provides API to pass SnpeUdo_GpuInfrastructure_t. During subsequent calls in the network, caching improves the JIT compilation time of OpenCL programs.

Note: SNPE provides tensor data corresponding to all inputs and outputs of a UDO, not directly inside tensorData but as opaque pointers. The UDO implementation is expected to convert it to a SnpeUdo_GpuTensorData_t and hold the Tensor’s OpenCL memory pointer. See SnpeUdo_GpuTensorData_t for details. The Per-op factory infrastructure object released by SNPE when creating a UDO op factory will provide users with an OpenCL context and an OpenCL command queue. See SnpeUdo_GpuOpFactoryInfrastructure_t for more details on the data structure.

Compile UDOs for the GPU on the device

The steps to compile the GPU UDO implementation library on the Android platform are as follows:

Set the environment variable $SNPE_UDO_ROOT.

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>

$NDK_BUILD must be set for the Andorid NDK build toolchain.

export NDK_BUILD=<absolute_path_to_android_ndk_directory>

$CL_LIBRARY_PATH must be set for libOpenCL.so library location.

export CL_LIBRARY_PATH=<absolute_path_to_OpenCL_library>

The OpenCL shared library is not distributed as part of the SNPE SDK.

Run the following make command to compile the UDO package:

make gpu_android

Note: Shared OpenCL libraries are target-specific. It should be in CL_LIBRARY_PATH.
The expected artifact after compiling for Android GPU is:

UDO GPU implementation library: /libs/arm64-v8a/libUdoImplGpu.so
UDO package registration library: /libs/arm64-v8a/libUdoReg.so
A copy of the shared standard C++ library: /libs/arm64-v8a/libc++_shared.so
A copy of the shared OpenCL library: /libs/arm64-v8a/libOpenCL.so
Implement UDO for DSP V65 and V66

SNPE utilizes QNN to run UDO layer on DSP. Therefore, a DSP implementation library based on the QNN SDK API is required to run the UDO package at DSP runtime. The UDO package generator tool will create a template file .cpp, and the user needs to implement the execution logic in the _executeOp() function of the template file.

SNPE UDO provides support for multithreading of operations using worker threads, Hex Vector Extensions (HVX) code, and VTCM support.

The DSP runtime only propagates unsigned 8-bit activation tensors between network layers. But it can dequantize the data to float if needed. Therefore, users developing DSP cores can expect UINT_8 or FLOAT_32 to activate tensor in and out operations, and thus can set the field data_type in the configuration file to one of these two settings. See Defining UDOs for details.

Compile UDOs on device for DSP V65 and V66

This SNPE release supports building UDO DSP implementation libraries using Hexagon-SDK 3.5.x.

Set the environment variable $SNPE_UDO_ROOT

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>

Requires Hexagon-SDK to be installed and set up. For details, follow the setup instructions on the page
h

E.

x

A

G

o

N

S

D.

K

R

o

o

T

/

d

o

c

the s

/

r

e

a

d

m

e

.

h

t

m

l

,in

h

E.

x

A

G

o

N

S

D.

K

R

o

o

T

yes

h

e

x

a

g

o

no

?

S

D.

K

installation location. make sure

HEXAGON_SDK_ROOT/docs/readme.html, where HEXAGON_SDK_ROOT is the Hexagon-SDK installation location. make sure

HEXAGONS?DKR?OOT/docs/readme.html, where HEXAGONS?DKR?OOT is the Hexagon?SDK installation location. Make sure HEXAGON_SDK_ROOT is set to use the Hexagon-SDK build toolchain. Also set $HEXAGON_TOOLS_ROOT and SDK_SETUP_ENV

export HEXAGON_SDK_ROOT=<hexagon sdk installation path>
export HEXAGON_TOOLS_ROOT=$HEXAGON_SDK_ROOT/tools/HEXAGON_Tools/8.3.07
export ANDROID_NDK_ROOT=<Android NDK installation path>
export SDK_SETUP_ENV=Done

$NDK_BUILD must be set for the Andorid NDK build toolchain.

export NDK_BUILD=<absolute_path_to_android_ndk_directory>

The target architecture can also be specified when compiling the package. If no target architecture is provided, both arm64-v8a and armeabi-v7a are targets.

export UDO_APP_ABI=<target_architecture>

Run the following make command to compile the UDO DSP implementation library:

Make dsp platform=$UDO_APP_ABI

Note: For DSP, PLATFORM will only determine the ABI of the registered library.

The expected artifact after compiling for DSP is

UDO DSP implementation library: /libs/dsp_/libUdoImplDsp.so
UDO package registration library: /libs/$UDO_APP_ABI/libUdoReg.so

NOTE: This command must be run from the package root.

Implement UDOs for DSP V68 or higher
SNPE leverages QNN to run the UDO layer on DSP v68 or later. Therefore, a DSP implementation library based on the QNN SDK API is required to run the UDO package at DSP runtime. The UDO package generator tool will create the template file ImplLibDsp.cpp, and the user needs to implement the execution logic in the Impl() function in the template file.

SNPE UDO provides support for Hexagonal Vector Extensions (HVX) codes and cost-based scheduling.

The DSP runtime propagates unsigned 8-bit or unsigned 16-bit activation tensors between network layers. But it can dequantize the data to float if needed. Therefore, users developing DSP cores can expect UINT_8, UINT_16, or FLOAT_32 to activate tensor in and out operations, and can therefore set the field data_type in the configuration file to one of these three settings. See QNN SDK for details.

Compile UDO for DSP_V68 or later on device
This SNPE release supports building UDO DSP implementation libraries using Hexagon-SDK 4.x and QNN SDK.

Set the environment variable $SNPE_UDO_ROOT

export SNPE_UDO_ROOT=<absolute_path_to_SnpeUdo_headers_directory>

Requires Hexagon-SDK 4.0+ to be installed and setup.
h

E.

x

A

G

o

N

S

D.

K

4

R

o

o

T

/

d

o

c

the s

/

r

e

a

d

m

e

.

h

t

m

l

about

h

e

x

a

g

o

no

?

S

D.

K

, follow the setup instructions on the page, where

h

E.

x

A

G

o

N

S

D.

K

4

R

o

o

T

yes

h

e

x

a

g

o

no

?

S

D.

K

The location of the installation. make sure

HEXAGON_SDK4_ROOT/docs/readme.html For more information about Hexagon-SDK, please follow the setup instructions on the page, where HEXAGON_SDK4_ROOT is the location where Hexagon-SDK is installed. make sure

HEXAGONS?DK4R?OOT/docs/readme.htmlFor more information about Hexagon?SDK, please follow the setup instructions on the page, where HEXAGONS?DK4R?OOT is the location where Hexagon?SDK is installed. Make sure HEXAGON_SDK4_ROOT is set to use the Hexagon-SDK build toolchain. Additionally, set

h

E.

x

A

G

o

N

T

o

o

L

S

R

o

o

T

and

S

D.

K

S

E.

T

u

P

E.

N

V

. Additionally, we need an extracted

Q

N

N

?

S

D.

K

(unnecessary

Q

N

N

?

S

D.

K

settings) to build the library. about

Q

N

N

?

S

D.

K

For details, see the page on

Q

N

N

document

HEXAGON_TOOLS_ROOT and SDK_SETUP_ENV. Also, we need an extracted QNN-SDK (QNN-SDK setup is not required) to build the library. For more information about the QNN-SDK, see the QNN documentation on the page

HEXAGONT? OOLSR? OOT and SDKS? ETUPE? NV. Also, we need an extracted QNN?SDK (QNN?SDK setup is not required) to build the library. For more information about QNN?SDK, please refer to the QNN documentation on the page QNN_SDK_ROOT/docs/index.html, where QNN_SDK_ROOT is the location where QNN-SDK is installed. Set $QNN_SDK_ROOT to the unpacked QNN-SDK location.

export HEXAGON_SDK_ROOT=<hexagon sdk installation path>
export HEXAGON_SDK4_ROOT=<hexagon sdk 4.x installation path>
export HEXAGON_TOOLS_ROOT=$HEXAGON_SDK_ROOT/tools/HEXAGON_Tools/8.3.07/
export QNN_SDK_ROOT=<QNN sdk installation path>
export ANDROID_ND K_ROOT= <Android NDK installation path>
export SDK_SETUP_ENV=Done

$NDK_BUILD must be set for the Andorid NDK build toolchain.

export NDK_BUILD=<absolute_path_to_android_ndk_directory>

The target architecture can also be specified when compiling the package. If no target architecture is provided, both arm64-v8a and armeabi-v7a are targets.

export UDO_APP_ABI=<target_architecture>

Run the following make command to compile the UDO DSP implementation library:

Make dsp platform=$UDO_APP_ABI

Run the following make command to generate the library for offline cache generation:

make dsp_x86 X86_CXX=<path_to_x86_64_clang>

Run the following make command to generate a library that can be used on the Android ARM architecture:

make dsp_aarch64

The expected artifact after compiling for DSP is

UDO DSP implementation library: /libs/dsp_v68/libUdoImplDsp.so
UDO package registration library: /libs/$UDO_APP_ABI/libUdoReg.so
The expected artifact after generating compilation for offline caching is
UDO DSP implementation library: /libs/x86-64_linux_clang/libUdoImplDsp.so
The expected artifact after compiling for the Android ARM architecture is
UDO DSP implementation library: /libs/$UDO_APP_ABI/libUdoImplDsp_AltPrep.so
NOTE: This command must be run from the package root.

Make a target table