[Use of STM32-DSP library] Based on Keil5 + STM32CubeMX + CMSIS-DSP manual addition and library addition method

Usage of STM32-DSP library

1.CMSIS-DSP
- 1.1 Introduction to DSP library
- 1.2 Supported function categories
- 1.3 Macro definition
2. Operation
- 2.1 STM32CubeMX configuration basic project
- 2.2 Implementation using Lib library (recommended)
- 2.3 Manually add DSP files (you can download the latest official library, which is fully functional)
3. MFCC tests DSP acceleration effect

In order to verify the use of fast Fourier transform FFT for speech recognition MFCC, I was frustrated by various tutorials on the Internet when applying the DSP library in the project. I hope to provide help to everyone; and with lib library, < strong>Manual src transplantationAchieved in two ways; the test environment Crotex-M4 is effective in actual testing (compared to Cortex-M3, a floating point operation unit and digital signal processing (DSP) instructions are added set, suitable for applications that need to deal with complex algorithms);

一.CMSIS-DSP

1.1 Introduction to DSP library

CMSIS Brief
Common Microcontroller Software Interface Standard (CMSIS) simplifies microcontroller software development and provides consistent and efficient interface. It promotes code reuse, portability, and interoperability, allowing developers to focus on application-level logic rather than dealing with low-level hardware details.
CMSIS provides processors and peripherals, real-time operating systems, and middleware components, and includes a delivery mechanism (CMSIS-Pack) for devices, motherboards, and software, and supports combining software components from multiple vendors. The content on keil.arm.com is extracted directly from the CMSIS package.
CMSIS was defined in close collaboration with various chip and software vendors and provides a common method of interfacing with peripherals. Real-time operating system and middleware components. It is designed to enable interoperability of software components from multiple vendors.
For details of DSP library description, please see the official website
This user manual introduces the CMSIS DSP software library, a common set of signal processing functions for Cortex-M and Cortex-A processor-based devices.

1.2 Supported function categories

Support for the library is divided into a number of functions, each covering a specific category:

basic mathematical functions

fast math functions

complex mathematical functions

filter function

matrix function

conversion function

Motor control function

statistical function

Support functions

interpolation function

Support vector machine function (SVM)

Bayesian classifier function

distance function

quaternion function

1.3 Macro definition

The library typically has separate functions for operating on 8-bit integers, 16-bit integers, 32-bit integers, and 32-bit floating point values. Preprocessor macros, each library project has different preprocessor macros.

ARM_MATH_BIG_ENDIAN:
Define the macro ARM_MATH_BIG_ENDIAN to build the library for big-endian targets. By default, the library is built for little-endian targets.

ARM_MATH_MATRIX_CHECK：
Define the macro ARM_MATH_MATRIX_CHECK for checking the input and output sizes of a matrix

ARM_MATH_ROUNDING：
Define macro ARM_MATH_ROUNDING for rounding support functions

ARM_MATH_LOOPUNROLL：
Define macro ARM_MATH_LOOPUNROLL to enable manual loop unrolling in DSP functions

ARM_MATH_NEON：
Define macro ARM_MATH_NEON to enable the Neon version of DSP functionality. By default, Neon is not enabled when available, as performance depends on the compiler and target architecture.

ARM_MATH_NEON_EXPERIMENTAL:
Define macro ARM_MATH_NEON_EXPERIMENTAL to enable experimental Neon versions of certain DSP functions. The experimental Neon version currently has no better performance than the scalar version.

ARM_MATH_HELIUM：
It means flags ARM_MATH_MVEF and ARM_MATH_MVEI and ARM_MATH_MVE_FLOAT16.

ARM_MATH_HELIUM_EXPERIMENTAL:
Only taken into account if ARM_MATH_MVEF, ARM_MATH_MVEI or ARM_MATH_MVE_FLOAT16 is defined. Enables some vector versions, which may perform worse than scalar depending on kernel/compiler configuration.

ARM_MATH_MVEF：
Select the Helium version of the f32 algorithm. It means ARM_MATH_FLOAT16 and ARM_MATH_MVEI.

ARM_MATH_MVEI:
Select the Helium version of the int and fixed point algorithms.

ARM_MATH_MVE_FLOAT16:
MVE Float16 implementation of certain algorithms (requires MVE extensions).

DISABLEFLOAT16：
Disable float16 algorithm when __fp16 is not supported by a specific compiler/kernel configuration. This only works for scalars. When f16 is supported on a vector architecture, it cannot be disabled.

ARM_MATH_AUTOVECTORIZE:
Using Helium or Neon, disable vectorized code with C intrinsics and use pure C instead. The vectorization is then done by the compiler.

Official github latest library version

2. Operation

2.1 STM32CubeMX configuration basic project

STM32407IGT6 implementation for engineering use

Simply configure the clock and Keil project. I will skip the details and focus on the configuration in Keil.

2.2 Implementation using Lib library (recommended)

Using the CMSIS-DSP package that comes with Keil, it is very convenient to add it with one click and there is no need to add a header file path or the like.

Add compilation macro, which can be configured according to the actual function (1.3), click OK

,ARM_MATH_CM4,__CC_ARM,ARM_MATH_MATRIX_CHECK,ARM_MATH_ROUNDING,__TARGET_FPU_VFP,__FPU_PRESENT=1

The following figure appears during compilation. This is caused by repeated macro definitions. The scope of the macro cannot be obtained from CMSIS-DSP. It will not work without adding it.

warning: #47-D: incompatible redefinition of macro "__FPU_PRESENT"

You can just comment out one of the codes, and the warning will not appear.

It was finally compiled (during the period, I once encountered the problem that the keil global macro modification did not take effect, which resulted in the compiler failing to compile. In the end, it could only be modified in the keil configuration table. Normally this problem would not occur)

2.3 Manually add DSP files (you can download the latest official library, with full functions)

Add compilation macro, consistent with (2.1)

,ARM_MATH_CM4,__CC_ARM,ARM_MATH_MATRIX_CHECK,ARM_MATH_ROUNDING,__TARGET_FPU_VFP,__FPU_PRESENT=1

Add DSP src file

It can be added according to the actual API used. For basic functions, add all .C files in the three folders shown in the figure below.

What should be noted here is that don’t miss arm_bitreversal2.S

Regarding the issue of assembly file compilation errors, the –cpreproc option tells armasm to first call armclang to process the assembly code when assembling the code, and then give the processed code to armasm to assemble into machine code.

error: A1163E: Unknown opcode defined, expecting opcode or Macro

Add the DSP header file path

When compiling, it is found that arm_dct4_X related functions are not used here, so they are directly eliminated in the project to avoid involving more references into the project.

Undefined symbol arm_cmplx_mult_cmplx_f32 (referred from arm_dct4_f32.o).

Final compilation OK

3. MFCC tests DSP acceleration effect

 // MFCC
// do the first mfcc with half old data(256) and half new data(256)
// then do the second mfcc with all new data(512).
// take mfcc buffer
    float startTime = __HAL_TIM_GetCounter( & amp;htim2)/100.0;
osMutexAcquire(mfcc_bufHandle, osWaitForever);
for(int i=0; i<2; i + + )
{<!-- -->
mfcc_compute(mfcc, & amp;audio_buffer_16bit[i*AUDIO_FRAME_LEN/2], mfcc_features_f);
\t\t\t
// quantise them using the same scale as training data (in keras), by 2^n.
quantize_data(mfcc_features_f, mfcc_features[mfcc_feat_index], MFCC_COEFFS, 3);
\t\t\t
// debug only, to print mfcc data on console
if(is_print_mfcc)
{<!-- -->
for(int i=0; i<MFCC_COEFFS; i + + )
printf("%d ", mfcc_features[mfcc_feat_index][i]);
printf("\
");
}
\t\t\t
mfcc_feat_index + + ;
if(mfcc_feat_index >= MFCC_LEN)
mfcc_feat_index = 0;
}
osMutexRelease(mfcc_bufHandle);
    printf("mfcc time %0.2f ms\r\
", __HAL_TIM_GetCounter( & amp;htim2)/100.0-startTime);

//Do not add DSP library
mfcc time 2.66 ms

//Add DSP library MFCC processingSpeed increase 56.0%
mfcc time 1.17 ms