(small script) (python) merge multiple documents

Article directory

  • foreword
  • Code
  • Code explanation
    • path processing
    • text merge
    • connection operation
  • END

Foreword

In some cases we need to merge the contents of multiple files into one.

If there are not too many complicated requirements in the middle, it will be very convenient to write a small script for direct merging.

Performance Purpose:

cuber-lotus/CodeStandard: C/C++ Code Standard (github.com)

Code Specification (C/C++ Specification)_Tianci Xilian’s Blog-CSDN Blog

For example, in my code specification, more than ten chapters need to be merged into one file. Using a script is much more convenient than doing it manually.

This article takes python as an example to achieve the effect of text merging.

Code

import os

'''
class: PathFactory

data:
cur_dir: script path

fun:
get_target_path
get_sources_path_list
'''
class PathFactory:
    # get absolute path
    # Note that __file__ py is an interpreted language
    cur_dir = os.path.split(os.path.realpath(__file__))[0]

    # According to the relative path of the target -> generate an absolute path
    def get_target_path(target_realpath: str):
        return os.path.abspath(os.path.join(PathFactory.cur_dir, target_realpath))

    # According to the relative path of the resource
    # Get the resource dir first and then get the files under the dir
    # Sort after filtering
    def get_sources_path_list(sources_realpath: str, filter_key: str):
        # generate path
        soruces_abspath = os.path.abspath(
            os.path.join(PathFactory.cur_dir, soruces_realpath))
        # Filter out the files under the path
        file_path_list = [name for name in os.listdir(
            soruces_abspath) if filter_key in name]
        file_path_list. sort()
        # Reassemble into an absolute path
        file_path_list = [os.path.join(soruces_abspath, name)
                          for name in file_path_list]
        return file_path_list


'''
fun: append_source_to_target
'''
def append_source_to_target(source_path: str, target_path: str):
    file_data = []
    with open(source_path, "r", encoding="utf-8") as file:
        file_data = file. readlines()
    with open(target_path, "a", encoding="utf-8") as file:
        # Filter out `[toc]` on the first line
        # Add a blank line at the end
        file.writelines(file_data[1:] + ["\\
"])

    print(source_path, " ===> success")


'''
main
'''
if __name__ == "__main__":
    # get target path
    target_path = PathFactory.get_target_path("./CodeStandard.md")

    # get sources path list
    file_path_list = PathFactory.get_sources_path_list("./chapter", ".md")

    #merge
    # remove old file
    if os.path.exists(target_path):
        os. remove(target_path)
    # content prefix
    with open(target_path, "a", encoding="utf-8") as file:
        file.write("# code specification\\
")
        file.write("[toc]\\
")

    # each chapter by sort
    for source_path in file_path_list:
        append_source_to_target(source_path, target_path)

    # content suffix
    with open(target_path, "a", encoding="utf-8") as file:
        file.write("---\\
")
        file.write("---\\
")
        file.write("---\\
")
        file.write("# END\\
")

Code explanation

Path handling

The os module is used here, and the user obtains and processes the path.

Note: The python path problem is a huge pit.

The path where the python script file is located and the path to start the script are two different things.

In order to avoid this pit in the specific code, __file__ is used to obtain the absolute address of the script file. Then, the path is spliced by using the relative address of the absolute address.

cur_dir:

Obtain the absolute path of the file through __file__

def get_target_path(target_realpath: str):

Get the absolute path of the output file

def get_sources_path_list(soruces_realpath: str, filter_key: str)::

Get list of input files

  • Get all files under the path generated by the relative path
  • Filter out the required files
  • to sort
  • Change to absolute path
import os


class PathFactory:
    # get absolute path
    # Note that __file__ py is an interpreted language
    cur_dir = os.path.split(os.path.realpath(__file__))[0]

    # According to the relative path of the target -> generate an absolute path
    def get_target_path(target_realpath: str):
        return os.path.abspath(os.path.join(PathFactory.cur_dir, target_realpath))

    # According to the relative path of the resource
    # Get the resource dir first and then get the files under the dir
    # Sort after filtering
    def get_sources_path_list(sources_realpath: str, filter_key: str):
        # generate path
        soruces_abspath = os.path.abspath(
            os.path.join(PathFactory.cur_dir, soruces_realpath))
        # Filter out the files under the path
        file_path_list = [name for name in os.listdir(
            soruces_abspath) if filter_key in name]
        file_path_list. sort()
        # Reassemble into an absolute path
        file_path_list = [os.path.join(soruces_abspath, name)
                          for name in file_path_list]
        return file_path_list

Text merge

Note: Use with open() as file: to automatically close the file.

Multiple files need to be merged here, so it is the method of appending a.

Here I have made adjustments according to my own needs, mainly reflected in:

  • remove the first line of each file
  • Newline at the end
def append_source_to_target(source_path: str, target_path: str):
    file_data = []
    with open(source_path, "r", encoding="utf-8") as file:
        file_data = file. readlines()
    with open(target_path, "a", encoding="utf-8") as file:
        # Filter out `[toc]` on the first line
        # Add a blank line at the end
        file.writelines(file_data[1:] + ["\\
"])

Connection operation

The basic steps are as follows:

  • Determine the output path strength

  • Determine the list of resource paths

  • delete old files

  • merge

  • PS: According to your own needs, add or delete the content you need in the first place or in the middle

if __name__ == "__main__":
    # get target path
    target_path = PathFactory.get_target_path("./CodeStandard.md")

    # get sources path list
    file_path_list = PathFactory.get_sources_path_list("./chapter", ".md")

    #merge
    # remove old file
    if os.path.exists(target_path):
        os. remove(target_path)
    # content prefix
    with open(target_path, "a", encoding="utf-8") as file:
        file.write("# code specification\\
")
        file.write("[toc]\\
")

    # each chapter by sort
    for source_path in file_path_list:
        append_source_to_target(source_path, target_path)

    # content suffix
    with open(target_path, "a", encoding="utf-8") as file:
        file.write("---\\
")
        file.write("---\\
")
        file.write("---\\
")
        file.write("# END\\
")

END