github, git, google: clang-front-end plug-in-add curly braces to various “blocks” without curly braces-based on llvm15–clang-plugin-add-brace—–google mirror

Processed statements

case

Terminology conventions or memos

  1. The starting and ending range of case: from the colon to the beginning of the next ‘case’, the abbreviation is: case inside, case content
  2. Ast: Abstract syntax tree: abstract syntax tree

Case without curly braces

If any of the following conditions are true in a case, the case will be skipped, that is, the content of the case will not be wrapped in curly braces.

  • There are #define,
  • There are #include,
  • There are direct variable declarations,
  • Empty case,
  • There is a macro call

Details

Preprocessing callback collection #include directives and macro definitions

CollectIncMacro_PPCb: Collect Inlucde Macro PPCallbacks: Collect Inlucde and Macro preprocessing callbacks

Collect #include and #define to determine whether there are #include and #define in the starting and ending range of the case

Traverse each statement within the starting and ending range of a case in switch

RangeHasMacroAstVst: Range Has Macro Call Ast Vistor: Whether there is a macro call Ast traverser in the given range

Nominally traverse the entire Stmt under the switch, actually traverse the statements within the given range (that is, within the start and end range of the case), and perform the following calculations:

1. hasMacro: case start and end range, whether there is macro call,
  This helps filter out cases with macro calls
2. caseKSubStmtCnt: case starting and ending range, number of statements (i.e. number of case sub-statements),
  This helps filter out empty cases
3. VarDeclDirectlyInCaseKCnt: The number of variable declaration statements written directly in 'case',
  That is, the number of direct variable declarations
  This helps filter out cases with direct variable declarations.
  The number of variable declaration statements written directly in 'case', including the following two situations:
  3.1. Written directly in 'case', its parent is the case statement
  3.2. Written directly in 'case', but its parent is a switch block.
    That is, the statement that exists in the case does not belong to the case but directly belongs to the switch. This phenomenon directly causes the sub-statement of the case to be a false proposition.
      This makes it impossible for RangeHasMacroAstVst to traverse the sub-statements under the case without omission.
        Only by expanding the traversal scope to the entire switch and focusing only on the statements within the start and end range of the case can we achieve an exhaustive and accurate traversal of the sub-statements under the case.

Actually run the curly brace plugin

At this point, the curly braces plug-in is completed and runs normally on llvm-project:

sudo docker exec -it ubuntu2204_clang15Compile bash

Pop up the bash command line of the docker instance ubuntu2204_clang15Compile. The following commands are executed under this command line.

cd /pubx/

git clone https://gitcode.net/pubz/llvm-project/-/commits/brc-dev-no_tick
#That is https://gitcode.net/pubz/llvm-project/-/commit/bee38a325d0957a28b4d06cb4be3c251d143cdf0
#After cloning the warehouse llvm-project, the directory structure is as follows: /pubx/llvm-project/.git/config
  • Step 1: Add curly braces to the single statement in each directly compiled source file

Apply the plug-in libBrcPlugin.so to the compilation process of each source file of llvm-project to add curly braces to the single statement in the source file.

source /pubx/llvm-project/doc_clang15_build/brc_build1_plugin.sh

brc_build1_plugin.sh

  • Step 2: Compile the llvm-project normally after adding the curly braces.
source /pubx/llvm-project/doc_clang15_build/brc_build2_directly.sh

brc_build2_directly.sh

  • Step 3: Verify
//Write the C language source file hello.c with the following content:
#include <stdio.h>
int main(int argc, char** argv){
   <!-- -->
  int a,b;
  printf("a,b:");
  scanf("%d,%d", & amp;a, & amp;b);
  int sum=a + b, diff=a-b, div=a/b, mod=a%b;
  printf("sum=%d,diff=%d,div=%d,mod=%d\\
",sum,diff,div,mod);
  return 0;
}
/pubx/build-llvm15/bin/clang-15 hello.c -o hello.app
./hello.app
a,b:45,21
sum=66,diff=24,div=2,mod=3

The compiler clang-15 compiled from the llvm-project source code after adding the curly braces compiles hello.c and compiles the binary file hello.app.

And the binary hello.app runs normally

This shows that the position of the curly braces is basically correct.

#statistics

find /pubx/llvm-project/ -not -path '*/.git/*' -type f \( -name "*.cpp" -or -name "*.c\ " \) | xargs -I% grep -Hn BrcXxx % > /pubx/BrcXxx.log

#Extract the previous bash command into a bash function
findBrcCommentThenSave() {
   <!-- -->
  set -x #bash enables display of executed commands
  keyword=$1
  find /pubx/llvm-project/ -not -path '*/.git/*' -type f \( -name "*.cpp" -or -name "*.c" \ ) | xargs -I% grep -Hn "$keyword" % |tee /pubx/"${keyword}.log"
  set + x #bash suppresses display of executed commands
}
findBrcCommentThenSave BrcThen
findBrcCommentThenSave BrcSw
findBrcCommentThenSave BrcElse
findBrcCommentThenSave BrcFor
findBrcCommentThenSave BrcForRange
findBrcCommentThenSave BrcWhl
findBrcCommentThenSave BrcSw

How many braces are added to each statement?

ls -S /pubx/Brc* | xargs -I% sh -c 'wc -l %; '

'''
93201/pubx/BrcThen.log
29832 /pubx/BrcSw.log
5539 /pubx/BrcElse.log
3603 /pubx/BrcFor.log
2187 /pubx/BrcForRange.log
663 /pubx/BrcWhl.log
'''

Among the various statements with curly braces, how many contain return?

These single statement returns are not inserted into the stack variable release statement by t_clock_tick because they are not wrapped in curly braces.
The allocation and release of stack variables in the tick plug-in are unbalanced. Specifically, there are 240,000 stack variables in total, and 20,000 are left unreleased in the end. Is this imbalance caused by these approximately 50,000 single return statements not releasing stack variables?
As shown below, approximately 50,000 statements containing return are inserted into braces by BrcPlugin.

ls -S /pubx/Brc* | xargs -I% sh -c 'echo -n "% "; grep return % |wc -l '

'''
/pubx/BrcThen.log 50438
/pubx/BrcSw.log 2681
/pubx/BrcElse.log 815
/pubx/BrcFor.log 6
/pubx/BrcForRange.log 4
/pubx/BrcWhl.log 2
'''

Implementation

CMakeLists.txt

cmake_minimum_required(VERSION 3.13.4)

set(LIBFMT_DIR "/pubx/fmt/")
#set(LIBFMT_STATIC /pubx/fmt/include)
set(LIBFMT_INCLUDE "${LIBFMT_DIR}/include/")
#set(LIBFMT_STATIC /pubx/fmt/build/libfmt.a)
set(LIBFMT_STATIC "${LIBFMT_DIR}/build/libfmt.a")

include_directories( "${CMAKE_CURRENT_SOURCE_DIR}/include")
include_directories( "${CMAKE_CURRENT_SOURCE_DIR}/base_home/include/")

if (NOT EXISTS "${LIBFMT_STATIC}")
  MESSAGE(FATAL_ERROR "libfmt static library ${LIBFMT_STATIC} does not exist, please refer to build-libfmt.sh to build libfmt static library")
endif()

if (NOT EXISTS "${LIBFMT_INCLUDE}")
  MESSAGE(FATAL_ERROR "libfmt header file directory ${LIBFMT_INCLUDE} does not exist, please refer to build-libfmt.sh to build libfmt static library")
endif()

#================================================== ==============================
# 0. GET CLANG INSTALLATION DIR
#Modify the default compiler
set(CT_Clang_INSTALL_DIR "/llvm_release_home/clang + llvm-15.0.0-x86_64-linux-gnu-rhel-8.4")
set(CMAKE_VERBOSE_MAKEFILE ON)
set(CURSES_LIBRARY "/lib64/libncurses.so.6")
set(CURSES_INCLUDE_PATH "/usr/include/")
set(CMAKE_EXPORT_COMPILE_COMMANDS True)
#The compiler should still use the built-in gcc, otherwise there will be no debugging information for libstdc++ during debugging, resulting in std::string not being displayed in gdb. Reference: https://stackoverflow.com/questions/58356385/python-exception- class-gdb-error-there-is-no-member-named-m-dataplus-whe/58356946#58356946
# gdb displays std::string and reports an error: There is no member named _M_dataplus. Therefore gdb does not display the value of std::string.
#set(CMAKE_C_COMPILER "/llvm_release_home/clang + llvm-15.0.0-x86_64-linux-gnu-rhel-8.4/bin/clang")
#set(CMAKE_CXX_COMPILER "/llvm_release_home/clang + llvm-15.0.0-x86_64-linux-gnu-rhel-8.4/bin/clang + + ")
set(LLVM_DIR "/llvm_release_home/clang + llvm-15.0.0-x86_64-linux-gnu-rhel-8.4")
#set(xxx "")

project(clang-brc)
#project is placed after the default compiler definition, otherwise cmake will loop endlessly


set(CT_LLVM_INCLUDE_DIR "${CT_Clang_INSTALL_DIR}/include/llvm")

set(CT_LLVM_CMAKE_FILE "${CT_Clang_INSTALL_DIR}/lib/cmake/clang/ClangConfig.cmake")

# http://llvm.org/docs/CMake.html#embedding-llvm-in-your-project
list(APPEND CMAKE_PREFIX_PATH "${CT_Clang_INSTALL_DIR}/lib/cmake/clang/")

find_package(Clang REQUIRED CONFIG)

# Sanity check. As Clang does not expose e.g. `CLANG_VERSION_MAJOR` through
# AddClang.cmake, we have to use LLVM_VERSION_MAJOR instead.
# TODO: Revisit when next version is released.
if(NOT "15" VERSION_EQUAL "${LLVM_VERSION_MAJOR}")
  message(FATAL_ERROR "Found LLVM ${LLVM_VERSION_MAJOR}, but need LLVM 15")
endif()

message(STATUS "Found Clang ${LLVM_PACKAGE_VERSION}")
message(STATUS "Using ClangConfig.cmake in: ${CT_Clang_INSTALL_DIR}")

message("CLANG STATUS:
  Includes (clang) ${
   <!-- -->CLANG_INCLUDE_DIRS}
  Includes (llvm) ${
   <!-- -->LLVM_INCLUDE_DIRS}"
)

# Set the LLVM and Clang header and library paths
include_directories(SYSTEM "${LLVM_INCLUDE_DIRS};${CLANG_INCLUDE_DIRS}")

#================================================== ==============================
# 3. CLANG-brc BUILD CONFIGURATION
#================================================== ==============================
# Use the same C++ standard as LLVM does
set(CMAKE_CXX_STANDARD 17 CACHE STRING "")

#Build type
if(NOT CMAKE_BUILD_TYPE)
  set(CMAKE_BUILD_TYPE Debug CACHE
      STRING "Build type (default Debug):" FORCE)
endif()

# Compiler flags
set(CMAKE_CXX_FLAGS "${
   <!-- -->CMAKE_CXX_FLAGS} -Wall\
    -fdiagnostics-color=always")

# LLVM/Clang is normally built without RTTI. Be consistent with that.
if(NOT LLVM_ENABLE_RTTI)
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fno-rtti")
endif()

# -fvisibility-inlines-hidden is set when building LLVM and on Darwin warnings
# are triggered if llvm-tutor is built without this flag (though otherwise it
# builds fine). For consistency, add it here too.
include(CheckCXXCompilerFlag)
check_cxx_compiler_flag("-fvisibility-inlines-hidden"
  SUPPORTS_FVISIBILITY_INLINES_HIDDEN_FLAG