How to use python for text proofreading, python shortcut keys for adjusting alignment

Hello everyone, the editor is here to answer the following questions for you, how to use python to proofread text, and the shortcut keys for adjusting alignment in python. Now let us take a look!

Use Python to process Word files

  • Install the external module python-docx
pip install python-docx

1. View Word file structure from Python

In the python-docx module, the Word file structure is divided into 3 layers:

  • Document: The highest level, representing the entire Word file.
  • Paragraph: A Word file consists of many paragraphs. In Python, the definition of the entire file is Document, and the definition of these paragraphs is the Paragraph object. Jiangsu Deputy Senior Professional Title Thesis Journal Requirements. In Python, a paragraph represents a
    Paragraph object, all paragraphs exist as a list of Paragraph objects.
  • Run: Things to consider in Word files include font size, font style, color, etc., which are collectively called styles. A Run object refers to consecutive text of the same style in the Paragraph object. If the style of the text changes, Python will represent it with a new Run object.

2. Read the content of Word file

  • Read simple word file
# author:mlnt
#createdate:2022/8/15
import docx #Import docx module

# 1. Create docx object
document = docx.Document('test.docx')

# 2. Obtain the number of Paragraph and Run
# Use the len() method to obtain the number of Paragraphs
paragraph_count = len(document.paragraphs)
print(f'Number of paragraphs: {paragraph_count}')
for i in range(0, paragraph_count):
    # Get the number of Paragraph Runs
    paragraph_run_count = len(document.paragraphs[i].runs) # i is the Paragraph number
    print(document.paragraphs[i].text) #Print Paragraph content
    print(document.paragraphs[i].runs[i].text) # Print the i-th Run content of the i-th paragraph


def getFile(filename):
    """Reading files and moderately editing files"""
    document = docx.Document(filename) # Create a Word file object
    content = []
    for paragraph in document.paragraphs:
        print(paragraph.text) # Output the Paragraph content read from the file
        content.append(paragraph.text) # Combine each paragraph into a list
    return '\\
\\
'.join(content) # Convert the list into a string and output it on alternate lines


print(getFile('test.docx'))
# store file
document.save('out_test.docx') # Copy the file to a new file

test.docx:

out_test.docx

  • Read the content of a word document containing tables
# author:mlnt
#createdate:2022/8/15
import docx #Import docx module
from docx.document import Document
from docx.oxml import CT_P, CT_Tbl
from docx.table import _Cell, Table, _Row
from docx.text.paragraph import Paragraph


def iter_block_items(parent):
    """
    Traverse the document content in sequence
    Generates references to each paragraph and table child in the parent in document order.
    Each return value is an instance of a table or paragraph.
    The parent object is usually a reference to the main document object, but also applies to _Cell objects, which can themselves contain paragraphs and tables.
    :param parent:
    :return:
    """
    # Determine whether the passed in is a word document object, if so, get all sub-objects of the document content
    if isinstance(parent, Document):
        parent_elm = parent.element.body
    # Determine whether the passed in cell is a cell, if so, get all sub-objects in the cell
    elif isinstance(parent, _Cell):
        parent_elm = parent.tc
    # Determine whether it is a table row
    elif isinstance(parent, _Row):
        parent_elm = parent.tr
    else:
        raise ValueError("something's not right")

    # Traverse all sub-objects
    for child in parent_elm.iterchildren():
        # Determine whether it is a paragraph, if so, return the paragraph object
        if isinstance(child, CT_P):
            yield Paragraph(child, parent)
        # Determine whether it is a table, if so, return the table object
        if isinstance(child, CT_Tbl):
            yield Table(child, parent)


# 1. Create docx object
document = docx.Document('test.docx')
# Traverse the word document, and stop traversing when the last function call does not return a value
for block in iter_block_items(document):
    # Determine whether it is a paragraph
    if isinstance(block, Paragraph):
        print(block.text)
    # Determine whether it is a table
    elif isinstance(block, Table):
        for row in block.rows:
            row_data = []
            for cell in row.cells:
                for paragraph in cell.paragraphs:
                    row_data.append(paragraph.text)
            print("\t".join(row_data))

Test documentation:

Reading effect:

3. Create file content

  • Create docx object

    # 1. Create docx object
    document = docx.Document()
    
  • Settings page

    # Set header
    run_header = document.sections[0].header.paragraphs[0].add_run("test")
    document.sections[0].header.paragraphs[0].alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # Center alignment
    
  • Add title

    # 2. Add title
    """
    add_heading(): Create a heading
    - document.add_heading('content_of_heading', level=n)
    """
    document.add_heading('Xia Ke Xing', level=1) # Title 1 format
    document.add_heading('Li Bai', level=2) # Title 2 format
    
  • Add paragraph

    # 3. Add paragraphs
    #Create paragraph object
    """
    add_paragraph(): Create paragraph Paragraph content
    - document.add_paragraph('paragraph_content')
    """
    paragraph_object = document.add_paragraph('Zhao Keman Hu Ying, Wu Gou Shuang Xueming.')
    document.add_paragraph('The silver saddle shines on the white horse, rustling like a shooting star.')
    document.add_paragraph('Kill one person in ten steps, leave no trace in a thousand miles.')
    document.add_paragraph('When the matter is over, he brushes off his clothes and goes away, hiding his body and name.')
    document.add_paragraph('After leisurely drinking in Xinling, I took off my sword and stretched my knees forward.')
    document.add_paragraph('He will eat Zhu Hai and hold a cup to persuade the marquis to win.')
    document.add_paragraph('Three cups of Turanuo, the five mountains are lighter.')
    document.add_paragraph('After the eyes are dazzled and the ears are hot, the spirit and spirit are born.')
    document.add_paragraph('Save Zhao with a golden mallet, Handan was shocked first.')
    document.add_paragraph('Two heroes from the Qianqiu period, the great Daliang City.')
    document.add_paragraph('Even if you die as a hero, you will not be ashamed of being a hero in the world.')
    document.add_paragraph('Who can write your Excellency, Baishou Taixuan Sutra.')
    prior_paragraph_object = paragraph_object.insert_paragraph_before('') # Insert a new paragraph before paragraph
    
  • Create Run content and set styles

    # 4. Create Run content
    """
    Paragraph is composed of Run. Use the add_run() method to insert content into Paragraph. The syntax is as follows:
    paragraph_object.add_run('run_content')
    """
    run1 = prior_paragraph_object.add_run('*'*13)
    run2 = prior_paragraph_object.add_run('%'*13)
    # Set the style of Run
    """
    bold: bold
    italic: italic
    underline: underline
    strike: delete line
    """
    run1.bold = True
    run2.underline = True
    
    # Set paragraph center alignment
    for i in range(len(document.paragraphs)):
        document.paragraphs[i].alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # Center alignment
    
  • Add form feed

    # 5. Add page feed character
    # add_page_break()
    document.add_page_break()
    
  • Insert picture

    # 6. Insert pictures
    # add_picture(), to adjust the picture width and height you need to import the docx.shared module
    document.add_picture('libai.jpeg', width=Pt(200), height=Pt(300))
    
    # Set center alignment
    document.paragraphs[len(document.paragraphs)-1].alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # Center alignment
    
  • Create a table, add data and set simple styles

    # 7. Create a table
    """
    add_table(rows=n, cols=m)
    """
    table = document.add_table(rows=2, cols=5)
    #Add table content
    #Add the first row of data
    row = table.rows[0]
    row.cells[0].text = 'Name'
    row.cells[1].text = 'word'
    row.cells[2].text = 'number'
    row.cells[3].text = 'Era'
    row.cells[4].text = 'Alias'
    #Add the 2nd row of data
    row = table.rows[1]
    row.cells[0].text = 'Li Bai'
    row.cells[1].text = 'Taibai'
    row.cells[2].text = 'Qinglian Jushi'
    row.cells[3].text = 'Tang Dynasty'
    row.cells[4].text = 'Shixian'
    
    # insert row
    new_row = table.add_row() # Add table rows
    new_row.cells[0].text = 'Bai Juyi'
    new_row.cells[1].text = 'Rakuten'
    new_row.cells[2].text = 'Xiangshan Jushi'
    new_row.cells[3].text = 'Tang Dynasty'
    new_row.cells[4].text = 'Shimo'
    
    #Insert column
    new_column = table.add_column(width=Inches(1)) # Add table columns
    new_column.cells[0].text = 'Masterpiece'
    new_column.cells[1].text = '"Xia Ke Xing", "Quiet Night Thoughts"'
    new_column.cells[2].text = '"Song of Everlasting Sorrow", "Pipa Play"'
    
    # Calculate the length of rows and cols of the table
    rows = len(table.rows)
    cols = len(table.columns)
    print(f'rows: {rows}')
    print(f'columns: {cols}')
    
    #Print table content
    # for row in table.rows:
    # for cell in row.cells:
    # print(cell.text)
    
    # Set table style
    # table.style = 'LightShading-Accent1'
    # UserWarning: style lookup by style_id is deprecated. Use style name as key instead.
    table.style = 'Light Shading Accent 1'
    # Loop to set each row and column to center
    for r in range(rows):
        for c in range(cols):
            table.cell(r, c).vertical_alignment = WD_CELL_VERTICAL_ALIGNMENT.CENTER # Vertical centering
            table.cell(r, c).paragraphs[0].paragraph_format.alignment = WD_TABLE_ALIGNMENT.CENTER # Horizontally centered
    
  • Set page number and save

    # Set page number
    add_page_number(document.sections[0].footer.paragraphs[0])
    # save document
    document.save('test2.docx')
    
  • Code to set page number (page_num.py)

    from docx import Document
    from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
    from docx.oxml import OxmlElement, ns
    
    
    def create_element(name):
        returnOxmlElement(name)
    
    
    def create_attribute(element, name, value):
        element.set(ns.qn(name), value)
    
    
    def add_page_number(paragraph):
        paragraph.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
    
        page_run = paragraph.add_run()
        t1 = create_element('w:t')
        create_attribute(t1, 'xml:space', 'preserve')
        t1.text = 'Page '
        page_run._r.append(t1)
    
        page_num_run = paragraph.add_run()
    
        fldChar1 = create_element('w:fldChar')
        create_attribute(fldChar1, 'w:fldCharType', 'begin')
    
        instrText = create_element('w:instrText')
        create_attribute(instrText, 'xml:space', 'preserve')
        instrText.text = "PAGE"
    
        fldChar2 = create_element('w:fldChar')
        create_attribute(fldChar2, 'w:fldCharType', 'end')
    
        page_num_run._r.append(fldChar1)
        page_num_run._r.append(instrText)
        page_num_run._r.append(fldChar2)
    
        of_run = paragraph.add_run()
        t2 = create_element('w:t')
        create_attribute(t2, 'xml:space', 'preserve')
        t2.text = ' of '
        of_run._r.append(t2)
    
        fldChar3 = create_element('w:fldChar')
        create_attribute(fldChar3, 'w:fldCharType', 'begin')
    
        instrText2 = create_element('w:instrText')
        create_attribute(instrText2, 'xml:space', 'preserve')
        instrText2.text = "NUMPAGES"
    
        fldChar4 = create_element('w:fldChar')
        create_attribute(fldChar4, 'w:fldCharType', 'end')
    
        num_pages_run = paragraph.add_run()
        num_pages_run._r.append(fldChar3)
        num_pages_run._r.append(instrText2)
        num_pages_run._r.append(fldChar4)
    
  • Complete code

    import docx
    from docx.enum.table import WD_TABLE_ALIGNMENT, WD_CELL_VERTICAL_ALIGNMENT
    from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
    from docx.shared import Pt, Inches
    from page_num import add_page_number
    
    # 1. Create docx object
    document = docx.Document()
    
    # Set header
    run_header = document.sections[0].header.paragraphs[0].add_run("test")
    document.sections[0].header.paragraphs[0].alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # Center alignment
    print(len(document.sections))
    
    # 2.Add title
    """
    add_heading(): Create a heading
    - document.add_heading('content_of_heading', level=n)
    """
    document.add_heading('Xia Ke Xing', level=1) # Title 1 format
    document.add_heading('Li Bai', level=2) # Title 2 format
    
    # 3. Add paragraph
    #Create paragraph object
    """
    add_paragraph(): Create paragraph Paragraph content
    - document.add_paragraph('paragraph_content')
    """
    paragraph_object = document.add_paragraph('Zhao Keman Hu Ying, Wu Gou Shuang Xueming.')
    document.add_paragraph('The silver saddle shines on the white horse, rustling like a shooting star.')
    document.add_paragraph('Kill one person in ten steps, leave no trace in a thousand miles.')
    document.add_paragraph('When the matter is over, he brushes off his clothes and goes away, hiding his body and name.')
    document.add_paragraph('After leisurely drinking in Xinling, I took off my sword and stretched my knees forward.')
    document.add_paragraph('He will eat Zhu Hai and hold a cup to persuade the marquis to win.')
    document.add_paragraph('Three cups of Turanuo, the five mountains are lighter.')
    document.add_paragraph('After the eyes are dazzled and the ears are hot, the spirit and spirit are born.')
    document.add_paragraph('Save Zhao with a golden mallet, Handan was shocked first.')
    document.add_paragraph('Two heroes from the Qianqiu period, the great Daliang City.')
    document.add_paragraph('Even if you die as a hero, you will not be ashamed of being a hero in the world.')
    document.add_paragraph('Who can write your Excellency, Baishou Taixuan Sutra.')
    prior_paragraph_object = paragraph_object.insert_paragraph_before('') # Insert a new paragraph before paragraph
    # 4. Create Run content
    """
    Paragraph is composed of Run. Use the add_run() method to insert content into Paragraph. The syntax is as follows:
    paragraph_object.add_run('run_content')
    """
    run1 = prior_paragraph_object.add_run('*'*13)
    run2 = prior_paragraph_object.add_run('%'*13)
    # Set the style of Run
    """
    bold: bold
    italic: italic
    underline: underline
    strike: delete line
    """
    run1.bold = True
    run2.underline = True
    
    # Set paragraph center alignment
    for i in range(len(document.paragraphs)):
        document.paragraphs[i].alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # Center alignment
    
    # 5. Add page feed character
    # add_page_break()
    document.add_page_break()
    # print(len(document.paragraphs))
    # 6. Insert picture
    # add_picture(), to adjust the picture width and height you need to import the docx.shared module
    document.add_picture('libai.jpeg', width=Pt(200), height=Pt(300))
    
    # Set center alignment
    document.paragraphs[len(document.paragraphs)-1].alignment = WD_PARAGRAPH_ALIGNMENT.CENTER # Center alignment
    
    # 7.Create table
    """
    add_table(rows=n, cols=m)
    """
    table = document.add_table(rows=2, cols=5)
    #Add table content
    #Add the first row of data
    row = table.rows[0]
    row.cells[0].text = 'Name'
    row.cells[1].text = 'word'
    row.cells[2].text = 'number'
    row.cells[3].text = 'Era'
    row.cells[4].text = 'Alias'
    #Add the 2nd row of data
    row = table.rows[1]
    row.cells[0].text = 'Li Bai'
    row.cells[1].text = 'Taibai'
    row.cells[2].text = 'Qinglian Jushi'
    row.cells[3].text = 'Tang Dynasty'
    row.cells[4].text = 'Shixian'
    
    # insert row
    new_row = table.add_row() # Add table rows
    new_row.cells[0].text = 'Bai Juyi'
    new_row.cells[1].text = 'Rakuten'
    new_row.cells[2].text = 'Xiangshan Jushi'
    new_row.cells[3].text = 'Tang Dynasty'
    new_row.cells[4].text = 'Shimo'
    
    #Insert column
    new_column = table.add_column(width=Inches(1)) # Add table columns
    new_column.cells[0].text = 'Masterpiece'
    new_column.cells[1].text = '"Xia Ke Xing", "Quiet Night Thoughts"'
    new_column.cells[2].text = '"Song of Everlasting Sorrow", "Pipa Play"'
    
    # Calculate the length of rows and cols of the table
    rows = len(table.rows)
    cols = len(table.columns)
    print(f'rows: {rows}')
    print(f'columns: {cols}')
    
    #Print table content
    # for row in table.rows:
    # for cell in row.cells:
    # print(cell.text)
    
    # Set table style
    # table.style = 'LightShading-Accent1'
    # UserWarning: style lookup by style_id is deprecated. Use style name as key instead.
    table.style = 'Light Shading Accent 1'
    # Loop to set each row and column to center
    for r in range(rows):
        for c in range(cols):
            table.cell(r, c).vertical_alignment = WD_CELL_VERTICAL_ALIGNMENT.CENTER # Vertical centering
            table.cell(r, c).paragraphs[0].paragraph_format.alignment = WD_TABLE_ALIGNMENT.CENTER # Horizontally centered
    
    # Set page number
    add_page_number(document.sections[0].footer.paragraphs[0])
    # save document
    document.save('test2.docx')
    

    Effect:

Reference:
  • Python – library docx (seven: page settings, header and footer paper) 1.16_python docx library insert header_EUDI’s blog-CSDN blog
  • Add page numbers using pythondocx – Q&A – Python Chinese website
  • [python-docx] Insert pictures, delete pictures, set picture size, extract pictures_python docx insert pictures_Icy Hope’s Blog-CSDN Blog
  • python office automation (5) python-docx adds documents, tables, pictures, sets paragraphs and font styles – CSDN Blog
  • python docx | Center table elements vertically and horizontally_python-docx center alignment_Sea Urchin Sur’s Blog-CSDN Blog
  • python-docx official documentation: python-docx – python-docx 0.8.11 documentation
  • python_docx reads the content of word_python docx search_yma16’s blog-CSDN blog
  • https://www.cnblogs.com/wl0924/p/16531087.html
  • Python reads the content of docx documents_python reads docx tables_Jiangxi Normal University-20th Class-Wu You’s Blog-CSDN Blog
  • python-docx gets tables from paragraphs | Problems encountered