STM32F407-Discovery’s hardware FPU

Table of Contents

1. Introduction to FPU of STM32F407

2. Configuration

3. Comparative testing

3.1 Example 1

3.2 Example 2

4. Summary


This article is based on the STM32 HAL library version: STM32Cube_FW_F4_V1.27.0

1. Introduction to FPU of STM32F407

FPU: Float Point Unit, which is the floating point operation unit. If a fixed-point CPU (CPU without FPU) calculates floating-point numbers, according to the IEEE-754 standard, a large number of instructions are required to complete the operation, which is very time-consuming and can hardly meet the real-time requirements. But for chips with FPU, it may only take a few instructions to complete a floating point operation, and the speed is relatively much faster.

STM32F4 has a 32-bit single-precision hardware FPU [which can accelerate the calculation of float type data], supports floating-point instruction sets, and has dozens or even hundreds of times higher computing performance than Cortex M0 and Cortex M3.

STM32F4 can decide whether to use the FPU by configuring the coprocessor control register (CPACR). If the FPU is used, hardware acceleration will be automatically enabled when floating point operations are encountered.

The register is described as follows:

Just set bit 20/21/22/23 to 1 to enable FPU hardware acceleration.

2. Configuration

File: system_stm32f4xx.c. Whether to enable FPU is managed by two macro definitions:

__FPU_PRESENT: This macro refers to whether the current IC has FPU

__FPU_USED: Whether to enable FPU

__FPU_PRESENT is defined in the corresponding header file, here it is: stm32f407xx.h

Whether to enable it depends on the application scenario and only needs to be configured in the MDK:

Floating Point Hardware: If Single Precision is selected, it is automatically associated with the macro __FPU_USED and its value is set to 1; if Not Used is selected, the value of the macro __FPU_USED is 0.

Here you can do a simple test to verify, two different options, print out the value of __FPU_USED in the code for comparison.

main.c

int main(void)
{

/* STM32F4xx HAL library initialization:
- Configure the Flash prefetch, instruction and Data caches
- Configure the Systick to generate an interrupt each 1 msec
-Set NVIC Group Priority to 4
- Global MSP (MCU Support Package) initialization
*/
HAL_Init();

/* Configure LED3, LED4, LED5 and LED6 */
// BSP_LED_Init(LED3);
// BSP_LED_Init(LED4);
// BSP_LED_Init(LED5);
BSP_LED_Init(LED6);

/* Configure the system clock to 168 MHz */
SystemClock_Config();

/* Serial port 2 initialization: only use tx function */
if(uart2_init(9600))
{
Error_Handler();
}

printf("__CC_ARM:%d\
", __CC_ARM);
printf("__FPU_PRESENT:%d\
", __FPU_PRESENT);
printf("__FPU_USED:%d\
", __FPU_USED);
printf("SCB->CPACR:0x%x\
", SCB->CPACR);

while(1){
;
}

return 0;
}

Two options, different printing, as follows:

[16:15:54.205] __CC_ARM:1 //Floating Point Hardware: Single Precision

[16:15:54.221] __FPU_PRESENT:1

[16:15:54.237] __FPU_USED:1

[16:15:54.253] SCB->CPACR:0xf00000

[16:15:54.205] __CC_ARM:1 //Floating Point Hardware: Not used

[16:15:54.221] __FPU_PRESENT:1

[16:15:54.237] __FPU_USED:0

[16:15:54.253] SCB->CPACR:0x000000

Note: If the configuration is changed, the entire configuration must be compiled.

Parse the file core_cm4.h:

This macro __CC_ARM is bound to MDK: so it is always defined

3. Comparison test

3.1 Example 1

Compare the time-consuming of 20,000 floating point mixed operations.

main.c

/* Includes ----------------------------------------------- -----------------------*/
#include "main.h"


/* Private functions -------------------------------------------------- -----------*/

float re_sub;
float a=0.14f;
float b=0.26f;

void test_fpu_tmp1(void)
{
long i, j;
float re_nul;
for(i=0; i<10000; i + + ) {
for(j=0; j<2; j + + ) {
re_nul=a*b;
        re_sub=re_sub + re_nul;
a=a + 0.1f;
b=b + 0.1f;
}
    }

    //printf("res:%f\
", res);//Printing is time-consuming
}

int main(void)
{

/* STM32F4xx HAL library initialization:
- Configure the Flash prefetch, instruction and Data caches
- Configure the Systick to generate an interrupt each 1 msec
-Set NVIC Group Priority to 4
- Global MSP (MCU Support Package) initialization
*/
HAL_Init();

/* Configure LED3, LED4, LED5 and LED6 */
BSP_LED_Init(LED6);

/* Configure the system clock to 168 MHz */
SystemClock_Config();

/* Serial port 2 initialization: only use tx function */
if(uart2_init(9600))
{
Error_Handler();
}

#if 1
BSP_LED_Off(LED6);
HAL_Delay(200);

BSP_LED_On(LED6);
//HAL_Delay(1000);
test_fpu_tmp1();

BSP_LED_Off(LED6);
printf("re_sub:%f\
", re_sub);
#endif

printf("__CC_ARM:%d\
", __CC_ARM);
printf("__FPU_PRESENT:%d\
", __FPU_PRESENT);
printf("__FPU_USED:%d\
", __FPU_USED);
printf("SCB->CPACR:0x%x\
", SCB->CPACR);

while(1){
;
}

return 0;
}

Turn on FPU: about 5ms

Without FPU: about 26ms

Turn on FPU. During debugging, if you look at the disassembly window, you will see xxx.F32, indicating that FPU hardware acceleration is used:

Comparison, without using hardware FPU acceleration:

Note: (1) The addition, subtraction, multiplication and division instructions of the STM32F4 FPU are:

VADD.F32

VSUB.F32

VMUL.F32

VDIV.F32

(2) Pay attention to the problem of data overflow. If there is data overflow, the program will run strangely.

(3) Pay attention to optimization issues, such as the following test program:

float a=0.14f;
float b=0.26f;
float re_nul;
void test_fpu_tmp1(void)
{
long i, j;
\t
for(i=0; i<10000; i + + ) {
for(j=0; j<2; j + + ) {
re_nul=a*b;
        re_nul=re_nul + re_nul;
}
    }

    //printf("res:%f\
", res);//Printing is time-consuming
}

Because the value of re_nul is the same every time it loops, it may not loop that many times at all (single-step debugging can verify this). I don’t know whether the compilation system has optimized it or the CPU operating mechanism itself has optimized it. And if this variable is not used elsewhere in the program, it may be optimized. Also note that printing is a time-consuming process. Because serial communication is serial communication, measuring the code running time cannot include printing statements. Therefore, when writing a test program, first the variable must be used elsewhere, and the value of the variable must be changed, as follows:

float re_sub;
float a=0.14f;
float b=0.26f;

void test_fpu_tmp1(void)
{
long i, j;
float re_nul;
for(i=0; i<10000; i + + ) {
for(j=0; j<2; j + + ) {
re_nul=a*b;
        re_sub=re_sub + re_nul;
a=a + 0.1f;
b=b + 0.1f;
}
    }

    //printf("res:%f\
", res);//Printing is time-consuming
}

.................................................................

//Print the value of variable re_sub in main
printf("re_sub:%f\
", re_sub);

3.2 Example 2

If the amount of calculation is too small, you may not see any difference. For example: complex fft of 128 points.

Two new files are added to the user src directory: complex.c and fft.c

complex.c:

#include "complex.h"

void complex_add(complex a, complex b, complex *c)
{
c->real = a.real + b.real;
c->img = a.img + b.img;
}

void complex_sub(complex a, complex b, complex *c)
{
c->real = a.real - b.real;
c->img = a.img - b.img;
}

void complex_mul(complex a, complex b, complex *c)
{
c->real = a.real*b.real - a.img*b.img;
c->img = a.real*b.img + a.img*b.real;
}

void complex_div(complex a, complex b, complex*c)
{
c->real = (a.real*b.real + a.img*b.img)/(b.real*b.real + b.img*b.img);
c->img = (a.img*b.real - a.real*b.img)/(b.real*b.real + b.img*b.img);
}

fft.c:

#include <stdio.h>
#include <math.h>
#include "fft.h"
#include "complex.h"

complex Wn, Temp, Res;

int L, B, P, K;

//The length of the input sequence must be an integer power of 2
void Reader_Sort(complex *x, int len)
{
complex temp;
int cur_rev = 0; //Starting from 0, deduct the inverse numbers corresponding to all original natural sorting numbers
int k = len / 2; //Initialize weight coefficient
int i, j;
for (j = 1; j <= len - 1; j + + ) {
//The highest bit of the current reciprocal number is 0,
if (cur_rev < k) {
//Change the highest bit of temp from 0 to 1 (add the weight coefficient) to get the next inverse number
cur_rev = cur_rev + k;
} else {
//The highest bit of the current reciprocal number is 1
while (cur_rev >= k) {
//Change the highest bit from 1 to 0 (just subtract the weight coefficient)
cur_rev = cur_rev - k;
//If the loop is not broken out, it means that the second highest bit is 1, then update the weight coefficient and set the current bit to 0,...
k = k / 2;
}
//The current bit (the highest bit) is 0, jump out of while(), and set it to 1 (plus the weight coefficient) to get the current inverse number
cur_rev = cur_rev + k;
//Restore weight coefficient value
k = len / 2;
}
//printf("j=%d, cur_rev=%d\
", j, cur_rev);
//Exchange x[j] and x[cur_rev]
if (j < cur_rev) {
//Real part exchange
temp.real = x[j].real;
x[j].real = x[cur_rev].real;
x[cur_rev].real = temp.real;

//Exchange the imaginary part
temp.img = x[j].img;
x[j].img = x[cur_rev].img;
x[cur_rev].img = temp.img;
}
}
}

void FFT(complex *input_seq, int SEQ_N, int SEQ_M, complex res_seq[])
{
int i, j, r;
int L, B, K, P;
complex Temp, Wn, Res;
if (!input_seq) {
printf("input sequence can be NULL\
");
return ;
}

Reader_Sort(input_seq, SEQ_N);

for (L=1; L <= SEQ_M; L + + ) {
B=1;
B=(int)pow(2, L-1);
for (j=0; j<=B-1; j + + ) {
K=(int)pow(2, SEQ_M-L);
P=1;
P=K*j;
for (i=0; i<=K-1; i + + ) {
r=j;
r=j + 2*B*i;
Temp = input_seq[r];
Wn.real = cos((2.0f*PI)/SEQ_N*P);
Wn.img = -1.0f*sin((2.0f*PI)/SEQ_N*P);
complex_mul(input_seq[r + B], Wn, & amp;Res);
input_seq[r].real=input_seq[r].real + Res.real;
input_seq[r].img=input_seq[r].img + Res.img;
input_seq[r + B].real=Temp.real - Res.real;
input_seq[r + B].img=Temp.img - Res.img;
}
}
}

if (!res_seq) {
printf("result sequence is NULL\
");
return ;
} else {
for(i=0; i<N; i + + ){
res_seq[i].real = input_seq[i].real;
res_seq[i].img = input_seq[i].img;
}
}
}

void iFFT(complex *input_seq, int SEQ_N, int SEQ_M, complex res_seq[])
{
int i, j, r;
int L, B, K, P;
complex Temp, Wn, Res;
if (!input_seq) {
printf("input sequence can be NULL\
");
return ;
}

Reader_Sort(input_seq, SEQ_N);

for (L=1; L <= SEQ_M; L + + ) {
B=1;
B=(int)pow(2, L-1);
for (j=0; j<=B-1; j + + ) {
K=(int)pow(2, SEQ_M-L);
P=1;
P=K*j;
for (i=0; i<=K-1; i + + ) {
r=j;
r=j + 2*B*i;
Temp = input_seq[r];
Wn.real = cos((2.0f*PI)/SEQ_N*P);
Wn.img = sin((2.0f*PI)/SEQ_N*P);
complex_mul(input_seq[r + B], Wn, & amp;Res);
input_seq[r].real=input_seq[r].real + Res.real;
input_seq[r].real=(1.0f/2.0f) * input_seq[r].real;
input_seq[r].img=input_seq[r].img + Res.img;
input_seq[r].img=(1.0f/2.0f) * input_seq[r].img;
input_seq[r + B].real=Temp.real - Res.real;
input_seq[r + B].real=(1.0f/2.0f) * input_seq[r + B].real;
input_seq[r + B].img=Temp.img - Res.img;
input_seq[r + B].img=(1.0f/2.0f) * input_seq[r + B].img;
}
}
}

if (!res_seq) {
printf("result sequence is NULL\
");
return ;
} else {
for(i=0; i<N; i + + ){
res_seq[i].real = input_seq[i].real;
res_seq[i].img = input_seq[i].img;
}
}
}

Correspondingly add new files complex.h and fft.h in the user directory inc.

complex.h:

#ifndef __COMPLEX_H_
#define __COMPLEX_H_

typedef struct {
float real;
float img;
} complex;

void complex_add(complex a, complex b, complex *c);
void complex_sub(complex a, complex b, complex *c);
void complex_mul(complex a, complex b, complex *c);
void complex_div(complex a, complex b, complex*c);

#endif

fft.h:

#ifndef _FFT_H_
#define _FFT_H_
#include "complex.h"

#define PI (3.14159267f)
/*
parameter:
(1)N=2^M
(2) L=1~M, level L
(3) The index p of the rotation factor, k is the increment of p, p=p*k
(4) B is element extraction interval = type of operation (type of rotation factor)
*/

#define N (128)
#define M (log(N)/log(2))

void FFT(complex *input_seq, int SEQ_N, int SEQ_M, complex res_seq[]);
void iFFT(complex *input_seq, int SEQ_N, int SEQ_M, complex res_seq[]);

#endif

Add header file to main.h

main.c:

/* Includes ----------------------------------------------- -----------------------*/
#include "main.h"

complex INPUT_SEQ[N], RES_SEQ[N], OUTPUT_SEQ[N];
float SEQ_DAT[N], dataR[N], dataI[N];
int fft_test(void)
{
int i, j;

//Construct a sequence of real numbers
for (i=0; i < N; i + + ) {
SEQ_DAT[i]=i + 0.0f;
}

//Construct a sequence of imaginary numbers
for (j=0; j<N; j + + ) {
INPUT_SEQ[j].real= SEQ_DAT[j];
INPUT_SEQ[j].img=0.0f;

}

// for (i=0; i <N; i + + ) {
// printf("before fft: INPUT_SEQ[%d].real=%f, INPUT_SEQ[%d].img=%f\
", i, INPUT_SEQ[i].real, i, INPUT_SEQ[i].img );
// }
// printf("\
\
");

#if 1
FFT(INPUT_SEQ, N, M, RES_SEQ);
// for (i=0; i <N; i + + ) {
// printf("fft: RES_SEQ[%d].real=%f, RES_SEQ[%d].img=%f\
", i, RES_SEQ[i].real, i, RES_SEQ[i].img) ;
// }
// printf("\
\
");

iFFT(RES_SEQ, N, M, OUTPUT_SEQ);
#else
HAL_Delay(1000);
#endif

// for (i=0; i <N; i + + ) { //Printing is time-consuming
// printf("ifft: OUTPUT_SEQ[%d].real=%f, OUTPUT_SEQ[%d].img=%f\
", i, OUTPUT_SEQ[i].real, i, OUTPUT_SEQ[i].img) ;
// }
// printf("\
\
");

return 0;
}

int main(void)
{
HAL_Init();

BSP_LED_Init(LED6);

/* Configure the system clock to 168 MHz */
SystemClock_Config();

/* Serial port 2 initialization: only use tx function */
if(uart2_init(9600))
{
Error_Handler();
}

#if 1
BSP_LED_Off(LED6);
HAL_Delay(200);

BSP_LED_On(LED6);
//HAL_Delay(1000);
//test_fpu_tmp1();
fft_test();
BSP_LED_Off(LED6);
for (int i=0; i <N; i + + ) { //Printing is time-consuming
printf("ifft: OUTPUT_SEQ[%d].real=%f, OUTPUT_SEQ[%d].img=%f\
", i, OUTPUT_SEQ[i].real, i, OUTPUT_SEQ[i].img);
}
printf("\
\
");
#endif

while(1) {
;
}

return 0;
}

If the FPU is configured, the forward and reverse transformation of 128 points takes about 0.2ms.

Serial port printing: data after inverse transformation

[09:42:26.534] ifft: OUTPUT_SEQ[1].real=1.000000, OUTPUT_SEQ[1].img=0.000000
[09:42:26.598] ifft: OUTPUT_SEQ[2].real=2.000000, OUTPUT_SEQ[2].img=0.000000
[09:42:26.663] ifft: OUTPUT_SEQ[3].real=3.000000, OUTPUT_SEQ[3].img=0.000000
[09:42:26.726] ifft: OUTPUT_SEQ[4].real=4.000000, OUTPUT_SEQ[4].img=0.000000
[09:42:26.790] ifft: OUTPUT_SEQ[5].real=5.000000, OUTPUT_SEQ[5].img=0.000000
[09:42:26.854] ifft: OUTPUT_SEQ[6].real=6.000000, OUTPUT_SEQ[6].img=0.000000
[09:42:26.918] ifft: OUTPUT_SEQ[7].real=7.000000, OUTPUT_SEQ[7].img=0.000000
[09:42:26.982] ifft: OUTPUT_SEQ[8].real=8.000000, OUTPUT_SEQ[8].img=0.000000
[09:42:27.046] ifft: OUTPUT_SEQ[9].real=9.000000, OUTPUT_SEQ[9].img=0.000000
[09:42:27.110] ifft: OUTPUT_SEQ[10].real=10.000000, OUTPUT_SEQ[10].img=0.000000
[09:42:27.174] ifft: OUTPUT_SEQ[11].real=11.000000, OUTPUT_SEQ[11].img=0.000000
[09:42:27.254] ifft: OUTPUT_SEQ[12].real=12.000000, OUTPUT_SEQ[12].img=0.000000
[09:42:27.317] ifft: OUTPUT_SEQ[13].real=13.000000, OUTPUT_SEQ[13].img=0.000000
[09:42:27.381] ifft: OUTPUT_SEQ[14].real=14.000000, OUTPUT_SEQ[14].img=0.000000
[09:42:27.445] ifft: OUTPUT_SEQ[15].real=15.000000, OUTPUT_SEQ[15].img=0.000000
[09:42:27.525] ifft: OUTPUT_SEQ[16].real=16.000000, OUTPUT_SEQ[16].img=0.000000
[09:42:27.589] ifft: OUTPUT_SEQ[17].real=17.000000, OUTPUT_SEQ[17].img=0.000000
[09:42:27.653] ifft: OUTPUT_SEQ[18].real=18.000000, OUTPUT_SEQ[18].img=0.000000
[09:42:27.717] ifft: OUTPUT_SEQ[19].real=19.000000, OUTPUT_SEQ[19].img=0.000000
[09:42:27.780] ifft: OUTPUT_SEQ[20].real=20.000000, OUTPUT_SEQ[20].img=0.000000
[09:42:27.860] ifft: OUTPUT_SEQ[21].real=21.000000, OUTPUT_SEQ[21].img=0.000000
[09:42:27.924] ifft: OUTPUT_SEQ[22].real=22.000000, OUTPUT_SEQ[22].img=0.000000
[09:42:27.988] ifft: OUTPUT_SEQ[23].real=23.000000, OUTPUT_SEQ[23].img=0.000000
[09:42:28.052] ifft: OUTPUT_SEQ[24].real=24.000000, OUTPUT_SEQ[24].img=0.000000
[09:42:28.132] ifft: OUTPUT_SEQ[25].real=25.000000, OUTPUT_SEQ[25].img=0.000000
[09:42:28.196] ifft: OUTPUT_SEQ[26].real=26.000000, OUTPUT_SEQ[26].img=0.000000
[09:42:28.260] ifft: OUTPUT_SEQ[27].real=27.000000, OUTPUT_SEQ[27].img=0.000000
[09:42:28.324] ifft: OUTPUT_SEQ[28].real=28.000000, OUTPUT_SEQ[28].img=0.000000
[09:42:28.404] ifft: OUTPUT_SEQ[29].real=29.000000, OUTPUT_SEQ[29].img=0.000000
[09:42:28.468] ifft: OUTPUT_SEQ[30].real=30.000000, OUTPUT_SEQ[30].img=0.000000
[09:42:28.532] ifft: OUTPUT_SEQ[31].real=31.000000, OUTPUT_SEQ[31].img=0.000000
[09:42:28.596] ifft: OUTPUT_SEQ[32].real=32.000000, OUTPUT_SEQ[32].img=0.000000
[09:42:28.676] ifft: OUTPUT_SEQ[33].real=33.000000, OUTPUT_SEQ[33].img=0.000000
[09:42:28.740] ifft: OUTPUT_SEQ[34].real=34.000000, OUTPUT_SEQ[34].img=0.000000
[09:42:28.803] ifft: OUTPUT_SEQ[35].real=35.000000, OUTPUT_SEQ[35].img=0.000000
[09:42:28.868] ifft: OUTPUT_SEQ[36].real=36.000000, OUTPUT_SEQ[36].img=0.000000
[09:42:28.947] ifft: OUTPUT_SEQ[37].real=37.000000, OUTPUT_SEQ[37].img=0.000000
[09:42:29.011] ifft: OUTPUT_SEQ[38].real=38.000000, OUTPUT_SEQ[38].img=0.000000
[09:42:29.075] ifft: OUTPUT_SEQ[39].real=39.000000, OUTPUT_SEQ[39].img=0.000000
[09:42:29.139] ifft: OUTPUT_SEQ[40].real=40.000000, OUTPUT_SEQ[40].img=0.000000
[09:42:29.203] ifft: OUTPUT_SEQ[41].real=41.000000, OUTPUT_SEQ[41].img=0.000000
[09:42:29.282] ifft: OUTPUT_SEQ[42].real=42.000000, OUTPUT_SEQ[42].img=0.000000
[09:42:29.346] ifft: OUTPUT_SEQ[43].real=43.000000, OUTPUT_SEQ[43].img=0.000000
[09:42:29.410] ifft: OUTPUT_SEQ[44].real=44.000000, OUTPUT_SEQ[44].img=0.000000
[09:42:29.474] ifft: OUTPUT_SEQ[45].real=45.000000, OUTPUT_SEQ[45].img=0.000000
[09:42:29.554] ifft: OUTPUT_SEQ[46].real=46.000000, OUTPUT_SEQ[46].img=0.000000
[09:42:29.618] ifft: OUTPUT_SEQ[47].real=47.000000, OUTPUT_SEQ[47].img=0.000000
[09:42:29.682] ifft: OUTPUT_SEQ[48].real=48.000000, OUTPUT_SEQ[48].img=0.000000
[09:42:29.746] ifft: OUTPUT_SEQ[49].real=49.000000, OUTPUT_SEQ[49].img=0.000000
[09:42:29.827] ifft: OUTPUT_SEQ[50].real=50.000000, OUTPUT_SEQ[50].img=0.000000
[09:42:29.890] ifft: OUTPUT_SEQ[51].real=51.000000, OUTPUT_SEQ[51].img=0.000000
[09:42:29.954] ifft: OUTPUT_SEQ[52].real=52.000000, OUTPUT_SEQ[52].img=0.000000
[09:42:30.018] ifft: OUTPUT_SEQ[53].real=53.000000, OUTPUT_SEQ[53].img=0.000000
[09:42:30.098] ifft: OUTPUT_SEQ[54].real=54.000000, OUTPUT_SEQ[54].img=0.000000
[09:42:30.162] ifft: OUTPUT_SEQ[55].real=55.000000, OUTPUT_SEQ[55].img=0.000000
[09:42:30.226] ifft: OUTPUT_SEQ[56].real=56.000000, OUTPUT_SEQ[56].img=0.000000
[09:42:30.290] ifft: OUTPUT_SEQ[57].real=57.000000, OUTPUT_SEQ[57].img=0.000000
[09:42:30.354] ifft: OUTPUT_SEQ[58].real=58.000000, OUTPUT_SEQ[58].img=0.000000
[09:42:30.433] ifft: OUTPUT_SEQ[59].real=59.000000, OUTPUT_SEQ[59].img=0.000000
[09:42:30.497] ifft: OUTPUT_SEQ[60].real=60.000000, OUTPUT_SEQ[60].img=0.000000
[09:42:30.561] ifft: OUTPUT_SEQ[61].real=61.000000, OUTPUT_SEQ[61].img=0.000000
[09:42:30.625] ifft: OUTPUT_SEQ[62].real=62.000000, OUTPUT_SEQ[62].img=0.000000
[09:42:30.705] ifft: OUTPUT_SEQ[63].real=63.000000, OUTPUT_SEQ[63].img=0.000000
[09:42:30.769] ifft: OUTPUT_SEQ[64].real=64.000000, OUTPUT_SEQ[64].img=0.000000
[09:42:30.833] ifft: OUTPUT_SEQ[65].real=65.000000, OUTPUT_SEQ[65].img=0.000000
[09:42:30.897] ifft: OUTPUT_SEQ[66].real=66.000000, OUTPUT_SEQ[66].img=0.000000
[09:42:30.977] ifft: OUTPUT_SEQ[67].real=67.000000, OUTPUT_SEQ[67].img=0.000000
[09:42:31.040] ifft: OUTPUT_SEQ[68].real=68.000000, OUTPUT_SEQ[68].img=0.000000
[09:42:31.104] ifft: OUTPUT_SEQ[69].real=69.000000, OUTPUT_SEQ[69].img=0.000000
[09:42:31.168] ifft: OUTPUT_SEQ[70].real=70.000000, OUTPUT_SEQ[70].img=0.000000
[09:42:31.248] ifft: OUTPUT_SEQ[71].real=71.000000, OUTPUT_SEQ[71].img=0.000000
[09:42:31.312] ifft: OUTPUT_SEQ[72].real=72.000000, OUTPUT_SEQ[72].img=0.000000
[09:42:31.376] ifft: OUTPUT_SEQ[73].real=73.000000, OUTPUT_SEQ[73].img=0.000000
[09:42:31.440] ifft: OUTPUT_SEQ[74].real=74.000000, OUTPUT_SEQ[74].img=0.000000
[09:42:31.520] ifft: OUTPUT_SEQ[75].real=75.000000, OUTPUT_SEQ[75].img=0.000000
[09:42:31.584] ifft: OUTPUT_SEQ[76].real=76.000000, OUTPUT_SEQ[76].img=0.000000
[09:42:31.647] ifft: OUTPUT_SEQ[77].real=77.000000, OUTPUT_SEQ[77].img=0.000000
[09:42:31.711] ifft: OUTPUT_SEQ[78].real=78.000000, OUTPUT_SEQ[78].img=0.000000
[09:42:31.775] ifft: OUTPUT_SEQ[79].real=79.000000, OUTPUT_SEQ[79].img=0.000000
[09:42:31.856] ifft: OUTPUT_SEQ[80].real=80.000000, OUTPUT_SEQ[80].img=0.000000
[09:42:31.919] ifft: OUTPUT_SEQ[81].real=81.000000, OUTPUT_SEQ[81].img=0.000000
[09:42:31.984] ifft: OUTPUT_SEQ[82].real=82.000000, OUTPUT_SEQ[82].img=0.000000
[09:42:32.048] ifft: OUTPUT_SEQ[83].real=83.000000, OUTPUT_SEQ[83].img=0.000000
[09:42:32.128] ifft: OUTPUT_SEQ[84].real=84.000000, OUTPUT_SEQ[84].img=0.000000
[09:42:32.192] ifft: OUTPUT_SEQ[85].real=85.000000, OUTPUT_SEQ[85].img=0.000000
[09:42:32.255] ifft: OUTPUT_SEQ[86].real=86.000000, OUTPUT_SEQ[86].img=0.000000
[09:42:32.319] ifft: OUTPUT_SEQ[87].real=87.000000, OUTPUT_SEQ[87].img=0.000000
[09:42:32.399] ifft: OUTPUT_SEQ[88].real=88.000000, OUTPUT_SEQ[88].img=0.000000
[09:42:32.463] ifft: OUTPUT_SEQ[89].real=89.000000, OUTPUT_SEQ[89].img=0.000000
[09:42:32.527] ifft: OUTPUT_SEQ[90].real=90.000000, OUTPUT_SEQ[90].img=0.000000
[09:42:32.591] ifft: OUTPUT_SEQ[91].real=91.000000, OUTPUT_SEQ[91].img=0.000000
[09:42:32.671] ifft: OUTPUT_SEQ[92].real=92.000000, OUTPUT_SEQ[92].img=0.000000
[09:42:32.735] ifft: OUTPUT_SEQ[93].real=93.000000, OUTPUT_SEQ[93].img=0.000000
[09:42:32.799] ifft: OUTPUT_SEQ[94].real=94.000000, OUTPUT_SEQ[94].img=0.000000
[09:42:32.862] ifft: OUTPUT_SEQ[95].real=95.000000, OUTPUT_SEQ[95].img=0.000000
[09:42:32.926] ifft: OUTPUT_SEQ[96].real=96.000000, OUTPUT_SEQ[96].img=0.000000
[09:42:33.006] ifft: OUTPUT_SEQ[97].real=97.000000, OUTPUT_SEQ[97].img=0.000000
[09:42:33.070] ifft: OUTPUT_SEQ[98].real=98.000000, OUTPUT_SEQ[98].img=0.000000
[09:42:33.134] ifft: OUTPUT_SEQ[99].real=99.000000, OUTPUT_SEQ[99].img=0.000000
[09:42:33.198] ifft: OUTPUT_SEQ[100].real=100.000000, OUTPUT_SEQ[100].img=0.000000
[09:42:33.278] ifft: OUTPUT_SEQ[101].real=101.000000, OUTPUT_SEQ[101].img=0.000000
[09:42:33.341] ifft: OUTPUT_SEQ[102].real=102.000000, OUTPUT_SEQ[102].img=0.000000
[09:42:33.421] ifft: OUTPUT_SEQ[103].real=103.000000, OUTPUT_SEQ[103].img=0.000000
[09:42:33.485] ifft: OUTPUT_SEQ[104].real=104.000000, OUTPUT_SEQ[104].img=0.000000
[09:42:33.565] ifft: OUTPUT_SEQ[105].real=105.000000, OUTPUT_SEQ[105].img=0.000000
[09:42:33.629] ifft: OUTPUT_SEQ[106].real=106.000000, OUTPUT_SEQ[106].img=0.000000
[09:42:33.693] ifft: OUTPUT_SEQ[107].real=107.000000, OUTPUT_SEQ[107].img=0.000000
[09:42:33.773] ifft: OUTPUT_SEQ[108].real=108.000000, OUTPUT_SEQ[108].img=0.000000
[09:42:33.836] ifft: OUTPUT_SEQ[109].real=109.000000, OUTPUT_SEQ[109].img=0.000000
[09:42:33.917] ifft: OUTPUT_SEQ[110].real=110.000000, OUTPUT_SEQ[110].img=0.000000
[09:42:33.981] ifft: OUTPUT_SEQ[111].real=111.000000, OUTPUT_SEQ[111].img=0.000000
[09:42:34.061] ifft: OUTPUT_SEQ[112].real=112.000000, OUTPUT_SEQ[112].img=0.000000
[09:42:34.125] ifft: OUTPUT_SEQ[113].real=113.000000, OUTPUT_SEQ[113].img=0.000000
[09:42:34.189] ifft: OUTPUT_SEQ[114].real=114.000000, OUTPUT_SEQ[114].img=0.000000
[09:42:34.269] ifft: OUTPUT_SEQ[115].real=115.000000, OUTPUT_SEQ[115].img=0.000000
[09:42:34.333] ifft: OUTPUT_SEQ[116].real=116.000000, OUTPUT_SEQ[116].img=0.000000
[09:42:34.413] ifft: OUTPUT_SEQ[117].real=117.000000, OUTPUT_SEQ[117].img=0.000000
[09:42:34.476] ifft: OUTPUT_SEQ[118].real=118.000000, OUTPUT_SEQ[118].img=0.000000
[09:42:34.556] ifft: OUTPUT_SEQ[119].real=119.000000, OUTPUT_SEQ[119].img=0.000000
[09:42:34.620] ifft: OUTPUT_SEQ[120].real=120.000000, OUTPUT_SEQ[120].img=0.000000
[09:42:34.700] ifft: OUTPUT_SEQ[121].real=121.000000, OUTPUT_SEQ[121].img=0.000000
[09:42:34.764] ifft: OUTPUT_SEQ[122].real=122.000000, OUTPUT_SEQ[122].img=0.000000
[09:42:34.828] ifft: OUTPUT_SEQ[123].real=123.000000, OUTPUT_SEQ[123].img=0.000000
[09:42:34.908] ifft: OUTPUT_SEQ[124].real=124.000000, OUTPUT_SEQ[124].img=0.000000
[09:42:34.971] ifft: OUTPUT_SEQ[125].real=125.000000, OUTPUT_SEQ[125].img=0.000000
[09:42:35.051] ifft: OUTPUT_SEQ[126].real=126.000000, OUTPUT_SEQ[126].img=0.000000
[09:42:35.116] ifft: OUTPUT_SEQ[127].real=127.000000, OUTPUT_SEQ[127].img=0.000000

4. Summary

(1) Using FPU can speed up operations

(2) The fft in this example is a floating-point fft, and it is a direct implementation of the algorithm without any optimization. Calculating trigonometric functions will take a long time. In the actual application process, the DSP library of CMSIS can be used directly. The fft, iir/fir filters and various mathematical operation libraries have been well optimized for the ST series chips, which facilitates the rapid implementation and development of algorithms [FPU + CMSIS-DSP].