1. Reference materials
openMP_demo
Getting started with OpenMP
OpenMP Tutorial (1) In-depth analysis of the OpenMP reduction clause
2. Introduction to OpenMP
1. Introduction to OpenMP
OpenMP (Open Multi-Processing) is a multi-threaded programming solution for shared memory parallel systems, supporting C/C++. OpenMP provides a high-level abstract description of parallel algorithms, which can implement parallel computing in multiple processor cores and improve program execution efficiency. The compiler automatically processes the program in parallel according to the pragma instructions added to the program. Using OpenMP reduces the difficulty and complexity of parallel programming. When the compiler does not support OpenMP, the program will degenerate into a normal (serial) program, and the existing OpenMP instructions in the program will not affect the normal compilation and operation of the program.
Many mainstream compilation environments have built-in OpenMP. In Visual Studio, starting OpenMP is very simple. Right-click on the project->Properties->Configuration Properties->C/C + ±>Language->OpenMP support and select “Yes”.
2. Shared memory model
OpenMP is designed for multi-processor and multi-core shared memory machines. The number of processing units (CPU cores) determines the parallelism of OpenMP.
3. Hybrid parallel model
OpenMP is suitable for single-node parallelism, and MPI is combined with OpenMP to achieve distributed memory parallelism, which is often called a hybrid parallel model.
- OpenMP is used for computationally intensive work on each node (one computer);
- MPI is used to implement communication and data sharing between nodes.
4. Fork-Join
Model
OpenMP uses the Fork-Join
model of parallel execution.
- Fork: The main thread creates a set of parallel threads;
- Join: The team threads perform calculations separately in the parallel area. They will be synchronized and terminated, leaving only the main thread.
In the above figure, parallel region
is a parallel region. Multi-threads run concurrently in the parallel region, and linear execution is performed by the main thread (master) between parallel regions.
5. barrier
Synchronization mechanism
barrier
is used for thread synchronization of code in the parallel domain. When the thread reaches barrier
, it must stop and wait until all threads have executed barrier
. Then continue execution to achieve thread synchronization.
#include <stdio.h> int main(void) {<!-- --> int th_id, nthreads; #pragma omp parallel private(th_id) {<!-- --> th_id = __builtin_omp_get_thread_num(); printf("Hello World from thread %d\ ", th_id); #pragma omp barrier if (th_id == 0) {<!-- --> nthreads = __builtin_omp_get_num_threads(); printf("There are %d threads\ ", nthreads); } } return 0; }
yoyo@yoyo:~/PATH/TO$ gcc -fopenmp demo.c -o demo yoyo@yoyo:~/PATH/TO$ ./demo Hello World from thread 10 Hello World from thread 3 Hello World from thread 2 Hello World from thread 6 Hello World from thread 4 Hello World from thread 7 Hello world from thread 0 Hello World from thread 5 Hello World from thread 11 Hello World from thread 8 Hello world from thread 1 Hello World from thread 9 There are 12 threads
3. Common operations
1. Common commands
# Install OpenMP sudo apt-get install libomp-dev # Use gcc to compile OpenMP programs gcc -fopenmp demo.c -o demo # Compile OpenMP program using g++ g++ -fopenmp demo.cpp -o demo
2. Important operations
(1) Parallel area: Use the #pragma omp parallel
directive to define the parallel area.
(2) Thread number: Use the omp_get_thread_num()
function to obtain the number of the current thread.
(3) Total number of threads: Use the omp_get_num_threads()
function to obtain the total number of threads.
(4) Data sharing: You can use keywords such as private
and shared
to declare the shared status of variables.
(5) Synchronization mechanism: You can use the #pragma omp barrier
instruction to implement thread synchronization.
3. Check whether OpenMP is supported
#include <stdio.h> int main() {<!-- --> #if _OPENMP printf("support openmp\ "); #else printf("not support openmp\ "); #endif return 0; }
yoyo@yoyo:~/PATH/TO$ gcc -fopenmp demo.c -o demo yoyo@yoyo:~/PATH/TO$ ./demo-1 support openmp
4. Hello World
#include <stdio.h> int main(void) {<!-- --> #pragma omp parallel {<!-- --> printf("Hello, world. \ "); } \t return 0; }
yoyo@yoyo:~/PATH/TO$ gcc -fopenmp demo.c -o demo yoyo@yoyo:~/PATH/TO$ ./demo Hello, world. Hello, world. Hello, world. Hello, world. Hello, world. Hello, world. Hello, world. Hello, world. Hello, world. Hello, world. Hello, world. Hello, world.
Since the number of threads is not specified, the default number is the number of CPU cores.
#include <stdio.h> int main(void) {<!-- --> //Specify the number of threads #pragma omp parallel num_threads(6) {<!-- --> printf("Hello, world. \ "); } \t return 0; }
5. #pragma omp parallel for
omp_get_thread_num
: Get the current thread id;
#include <stdio.h> #include <omp.h> #include <stdlib.h> int main(void) {<!-- --> #pragma omp parallel for for (int i=0; i<12; i + + ) {<!-- --> printf("OpenMP Test, th_id: %d\ ", omp_get_thread_num()); } return 0; }
yoyo@yoyo:~/PATH/TO$ gcc -fopenmp demo.c -o demo yoyo@yoyo:~/PATH/TO$ ./demo OpenMP Test, th_id: 8 OpenMP Test, th_id: 3 OpenMP Test, th_id: 1 OpenMP Test, th_id: 9 OpenMP Test, th_id: 5 OpenMP Test, th_id: 0 OpenMP Test, th_id: 6 OpenMP Test, th_id: 11 OpenMP Test, th_id: 2 OpenMP Test, th_id: 7 OpenMP Test, th_id: 4 OpenMP Test, th_id: 10
6. reduction
Reduction operation
6.1 Introduction
#include <stdio.h> #include <omp.h> #include <stdlib.h> int main(void) {<!-- --> \t int sum = 0; #pragma omp parallel for for (int i=1; i<=100; i + + ) {<!-- --> sum + = i; } printf("%d", sum); return 0; }
yoyo@yoyo:~/PATH/TO$ gcc -fopenmp demo.c -o demo yoyo@yoyo:~/PATH/TO$ ./demo 1173yoyo@yoyo:~/PATH/TO$ ./demo 2521yoyo@yoyo:~/PATH/TO$ ./demo 3529yoyo@yoyo:~/PATH/TO$ ./demo 2174yoyo@yoyo:~/PATH/TO$ ./demo 1332yoyo@yoyo:~/PATH/TO$ ./demo 1673yoyo@yoyo:~/PATH/TO$ ./demo 1183yoyo@yoyo:~/PATH/TO$
Executed multiple times, the results are different each time because threads compete for the same resource. For the line sum + = i;
, it can be rewritten as sum = sum + i
. Multi-threads will write sum
simultaneously, causing conflicts. To solve this problem, reduction
can be used.
6.2 reduction
Introduction
reduction(operator: variable)
Let’s take the sum
summation function as an example:
#include <stdio.h> #include <omp.h> #include <stdlib.h> int main(void) {<!-- --> \t int sum = 0; #pragma omp parallel for reduction( + :sum) for (int i=1; i<=100; i + + ) {<!-- --> sum + = i; } printf("%d", sum); return 0; }
yoyo@yoyo:~/PATH/TO$ gcc -fopenmp demo.c -o demo yoyo@yoyo:~/PATH/TO$ ./demo 5050yoyo@yoyo:~/PATH/TO$ ./demo 5050yoyo@yoyo:~/PATH/TO$ ./demo 5050yoyo@yoyo:~/PATH/TO$ ./demo 5050yoyo@yoyo:~/PATH/TO$ ./demo 5050yoyo@yoyo:~/PATH/TO$ ./demo 5050yoyo@yoyo:~/PATH/TO$
In the above code, reduction( + :sum)
means copying the variable sum in each thread, and then using this copied variable in the thread. In this way, there is no data competition problem, because for each thread Each thread uses different sum data. There is also a plus sign +
in reduction
. This plus sign indicates how to perform reduction operation. The so-called reduction operation simply means that multiple data are gradually manipulated, and finally a data that cannot be reduced is obtained.
For example, in the above program, the reduction operation is +
, so the data of thread 1 and thread 2 need to be operated by +
, that is, the sum value of thread 1 plus the thread The sum value of 2 is then assigned to the global variable sum, and so on for other threads. The final global variable sum is the correct result.
If there are 4 threads, then there are 4 thread-local sums, and each thread copies the sum. Then the result of the reduction operation is equal to:
(
(
(
s
u
m
1
+
s
u
m
2
)
+
s
u
m
3
)
+
s
u
m
4
)
(((sum_1 + sum2) + sum_3) + sum_4)
(((sum1? + sum2) + sum3?) + sum4?)
Among them, sum_i
represents the sum obtained by the i-th thread.