Linux C++ implements thread binding CPU

Foreword

In embedded systems, we will use multi-core CPUs. As the performance of product chips improves, we will also have many functions and many processes running. At this time, when we perform task scheduling and tuning, we will Some processes are bound to a fixed CPU to run. Let’s share the process of CPU binding to run:
First, the running environment requires multiple cores. You can follow the following command to query the number of CPUs of the corresponding device.
Use cat /proc/cpuinfo to view cpu information, as follows:
processor, specify which CPU processor
cpu cores, specify the number of cores per processor
Please add image description

Basic concepts

cpu affinity

CPU affinity means that the process should run on the specified CPU for as long as possible without being migrated to other processors. It is also called CPU affinity. A simple description means binding the specified process or thread. to the corresponding CPU; on a multi-core machine, each CPU itself has its own cache, which caches information used by the process, and the process may be scheduled by the OS to other CPUs. In this way, the CPU cache hit rate is low. , when the CPU is bound, the program will always run on the specified CPU and will not be scheduled by the operating system to other CPUs, and the performance will be improved to a certain extent.
Soft affinity
That is, the process should run on the specified CPU for as long as possible without being migrated to other processors. The Linux kernel process scheduler inherently has a characteristic called soft CPU affinity, which means that the process usually does not Will be frequently migrated between processors. This state is exactly what we want, because the frequency of process migration means that the load generated is small.
Hard affinity
To put it simply, it uses the API provided by the Linux kernel to users to forcibly bind a process or thread to a specified CPU core to run.

Code binding implementation

Above we used the cat /proc/cpuinfo command to query the CPU of our device. Taking me as an example, my computer has a total of 8 cores from 0 to 7.
Here is a demo of the code:

#include <stdio.h>
#include <unistd.h>
#include <thread>

void thread_func1()
{<!-- -->
        cpu_set_t mask;
        CPU_ZERO( & amp;mask);
        CPU_SET(1, & amp;mask); //Specify the CPU used by this thread
        if (pthread_setaffinity_np(pthread_self(), sizeof(mask), & amp;mask) < 0) {<!-- -->
                perror("pthread_setaffinity_np");
        }
        int count = 0;
        while(1)
        {<!-- -->
                count + + ;
                sleep(1);
                printf("fun 1 cnt :%d \
",count);
                for(int i = 0; i < 8; i + + ) {<!-- -->
                        if (CPU_ISSET(i, & amp;mask)) //Check whether cpu i is in the get set
                        {<!-- -->
                                printf("1 this process %d of running processor: %d\
", getpid(), i);
                        }
                }
        }
}

void thread_func2()
{<!-- -->
        int count = 0;
        cpu_set_t mask;
        CPU_ZERO( & amp;mask);
        CPU_SET(5, & amp;mask);
        if (pthread_setaffinity_np(pthread_self(), sizeof(mask), & amp;mask) < 0) {<!-- -->
                perror("pthread_setaffinity_np");
        }
        while(1)
        {<!-- -->
                usleep(1000*1000);
                count + + ;
                printf("fun 2 cnt :%d \
",count);
                for(int i = 0; i < 8; i + + ) {<!-- -->
                 if (CPU_ISSET(i, & amp;mask)) //Check whether cpu i is in the get set
                {<!-- -->
                       printf("2 this process %d of running processor: %d\
", getpid(), i);

                    }
                }

        }

}

int main(int argc, char *argv[])
{<!-- -->
      int cpus = 0;

        cpus = sysconf(_SC_NPROCESSORS_CONF);

        printf("cpus: %d\
", cpus); //Check the number of cpu;


    cpu_set_t mask;
    CPU_ZERO( & amp;mask);
    CPU_SET(7, & amp;mask);
    if (sched_setaffinity(0, sizeof(mask), & amp;mask) < 0) {<!-- -->
        perror("sched_setaffinity");
    }

    std::thread t1(thread_func1);
    std::thread t2(thread_func2);
        usleep(1000); /* Give the current settings enough time to take effect*/
       while(1)
        {<!-- -->
                /*View the CPU running the current process*/
                sleep(1); /* Give the current settings enough time to take effect*/
                printf("fun main \
");
                for(int i = 0; i < cpus; i + + ) {<!-- -->

                        if (CPU_ISSET(i, & amp;mask)) //Check whether cpu i is in the get set
                        {<!-- -->
                                printf("3 this process %d of running processor: %d\
", getpid(), i);
                        }
                }
        }
    t1.join();
    t2.join();
}

A total of three threads are running above, one is the main thread, and the other two are self-defined threads.
The most important setting code is as follows: Set cpu affinity

cpu_set_t mask;
CPU_ZERO( & amp;mask);/* Initialize the set and set it to empty */
CPU_SET(5, & amp;mask);/* Add the corresponding cpu serial number to the set*/
if (pthread_setaffinity_np(pthread_self(), sizeof(mask), & amp;mask) < 0) /*Set cpu affinity (affinity)*/
{<!-- -->
    perror("pthread_setaffinity_np");
}

Execute code
Please add image description

Code binding view

Use ps -ef | grep a.out command to view the corresponding PID
Please add a picture description Use the top command to view the thread details corresponding to the pid top -p 14617After entering the top command, continue to enter f
Please add image description Use up and down to move the highlight to p
Space Select
Press q again to exit the display
Enter capital H
You can see the corresponding thread data Please add a picture description

Notes

Although core bundling technology can improve program performance, you also need to pay attention to the following points:

Don’t over-bind: Over-binding can lead to contention between threads and poor CPU utilization.
Evaluation before binding: Before core binding, the program needs to be evaluated to determine the location of the performance bottleneck and the number of cores to bind.
Do not access memory across cores: If a process is bound to a core, the memory used by the process should only operate on memory dedicated to that core. If memory operations are performed frequently between different cores, it will affect the performance of the program.

Quote https://blog.csdn.net/lyn631579741/article/details/123337907