[Office Automation] Use Python to fill in data in the form of a Word document with one click

3f6a7ab0347a4af1a75e6ebadee63fc1.gif

?♂? Personal homepage: @ aiperson’s personal homepage

?About the author: Python learner
I hope everyone will support us and we will make progress together!
If the article is helpful to you,
Welcome to comment Like Collection Add follow +

Table of Contents

1. Python processing Word

2. Fill in the data in the table of the Word document with one click

3. Past recommendations

1. Python processing Word

  • The benefits of Python processing Word

There are many advantages to working with Word documents using Python, here are some of the major ones:

  1. Automation and batch processing: Python can help you automate processing of large numbers of Word documents, such as replacing text in batches, generating reports, extracting data, etc. This saves a lot of time and labor.

  2. Data Extraction and Transformation: If you need to extract specific data from a large number of Word documents, Python can easily achieve this. You can write scripts to search and extract the information you need, then convert it to other formats such as CSV, Excel, or a database.

  3. Customization and freedom: Using Python to process Word documents means you can customize the process to suit your needs. You can create custom templates, styles, and layouts, and flexibly adjust them to your project needs.

  4. Integration with other tools and technologies: Python can be integrated with many other libraries and technologies, such as databases, web services, data analysis tools, etc. This allows you to combine the processing of Word documents with other workflows to achieve more complex tasks.

  5. Document generation and reporting: If you need to generate reports or documents on a regular basis, Python can do this by automatically generating text, charts, tables, etc. This is useful for data analysis, project management, business reporting, etc.

  6. Version control and code reuse: Using Python code to process Word documents can achieve code reuse and version control. You can store the code that processes documents in a version control system so team members can collaborate, maintain, and share the code.

  7. Cross-platform: Most Python libraries can run on multiple operating systems, so you can work with Word documents on different platforms such as Windows, Linux and macOS.

  • Python third-party library for processing Word

In Python, there are several third-party libraries that can be used to process Word documents. The following are some commonly used third-party libraries:

  1. python-docx: This is a popular library for creating and modifying Microsoft Word documents (.docx format). It allows you to read, edit and create Word documents, including paragraphs, tables, images, etc.

  2. pywin32 (Windows only): This library allows you to interact with Microsoft Office applications on Windows, including Word. It can be used to automate operations such as creating, opening, editing, and saving documents.

  3. docx2txt: This library can be used to convert .docx files to plain text (.txt format). It is suitable for extracting text content from Word documents.

  4. python-docx-template: This library extends the python-docx library to enable you to insert variables and loops into templates and generate Word documents in a more dynamic way.

  5. py-docx: This is another library for processing .docx files. It provides functionality similar to python-docx, but uses a different interface.

2. Fill in the data in the table of the Word document with one click

Technical Tools:

Python version: 3.9

Code editor: jupyter notebook

At work, we often fill in part of the information in the Excel table into the corresponding table in the Word document to generate a report for easy printing. Today we will try how to fill in data in a fixed Word template. This example requires filling in the “receipt data” of the Excel file into the “receipt record” of the Word file, and adding up the received quantity. The template is as follows.

bc9ceadd4fb3484ba6809d043e7670fa.png

The required result is this:

415df64d9c1d48b3aa3a3832223bd965.png

First, the data needs to be extracted from the Excel table.

#Extract data from Excel table
from openpyxl import load_workbook
wb = load_workbook("receipt data.xlsx")
ws = wb.active
data=[]
for row in range(2,ws.max_row + 1):
    seq = ws["A" + str(row)].value
    supplier = ws["B" + str(row)].value
    material_pn = ws["C" + str(row)].value
    material_model = ws["D" + str(row)].value
    desp = ws["E" + str(row)].value
    qty = ws["F" + str(row)].value
    date = ws["G" + str(row)].value.date()
    info = [seq, supplier, material_pn, material_model, desp, qty, date]
    data.append(info)
# Check the data
data[0]

1a728d48a2d846149a9250b07faaa217.png

Since the data in the quantity column needs to be summed, a summation function is first defined so that it can be called later.

#Define quantity summation function
def Sum_list(list):
    s = 0
    for i in list: #accumulate all numbers in the list
        s + =i
    return s
# Sum all the numbers in the quantity column
qty_list=[]
for i in data:
    qty_list.append(i[5]) #The quantity is at the 6th position of the inner list, the index is 5
sum_qty= Sum_list(qty_list) #Call the summing function to add up
sum_qty

7016279ba0e24c4fbeab6b41667073e8.png

Define cell merge function

#Define the function of merging cells
def Merge_cells(table,target_list,start_row,col):
    '''
    table: is the table that needs to be operated on
    target_list: is a target list, that is, a list containing duplicate data
    start_row: is the start row, that is, the row in the table where data comparison begins (the title needs to be removed)
    col: is the column that needs to process data
    '''
    start = 0 #Start row counting
    end = 0 #end line count
    reference = target_list[0] #Set the reference, starting with the first string in the list
    for i in range(len(target_list)): #Traverse the list
        if target_list[i] != reference: #Start the comparison. If the contents are different, perform the following
            reference = target_list[i] #The reference becomes the next string in the list
            end=i-1
            table.cell(start + start_row,col).merge(table.cell(end + start_row,col))
            start=end+1
        if i == len(target_list) - 1: #Traverse to the last line and do as follows
            end=i
            table.cell(start + start_row,col).merge(table.cell(end + start_row,col))

Then write data into the word table and merge cells.

#After the data extraction is completed, you can write data into the Word table.
from docx import Document
doc = Document("Receipt Record Template.docx")
#Read the data in the second and third columns of the first table in the word document except the title and tail total rows
table = doc.tables[0] #It has been determined to be the first table, and its index is 0
supplier = [] #Storage supplier name
pn = [] #Storage material code
for i in data:
    supplier.append(i[1])
    pn.append(i[2])
#Add rows as needed to fill in data
for i in range(len(supplier)): #There is already one line in the template, so you only need to add len(supplier) lines in total
    table.add_row()
#After adding rows, do the merge cell operation first
Merge_cells(table,supplier,1,1) #Start merging behavior 2, index is 1; supplier name is in column 2, index is 1
Merge_cells(table,pn,1,2) #Start merging behavior 2, index is 1; material code is in column 3, index is 2

#Write data to the table
for row in range(1,len(supplier) + 1):
    for col in range(7):
        table.cell(row,col).text = str(data[row-1][col])

max_row = len(table.rows) #Get the largest row
qty_row = max_row-1 #Determine a row that needs to be written with total data
table.cell(qty_row,5).merge(table.cell(qty_row,5)) #Merge the two cells in the lower right corner used to fill in the quantity
table.cell(qty_row,4).text = 'Total number:'
table.cell(qty_row,5).text = str(sum_qty)

doc.save("Receipt record.docx")

Since cells with the same content in the supplier name and material code columns are merged, the information in these two columns needs to be extracted separately from the total data `data`. Create two new empty lists `supplier` and `pn`, then traverse `data`, take the data out and store it in the empty list. Then, we add some blank lines according to actual needs to facilitate writing data later. The number of new rows is based on the number of elements in the `supplier` list. There is already a blank row in this table. If you include the outside rows and you need to fill in the “total number”, you only need to increase the number of elements in the `supplier` list to correspond to that many rows. Of course, this can also be based on the number of elements in the list `pn`, and the two are equal. After adding rows, first call the function `Merge_cells` to merge cells (if you write data first and then merge, the cell contents will be duplicated). After merging, you can write data to the table. Here, traverse the rows and columns where data needs to be written, and then extract the data from `data`. After the data row is written, the word “Total” and the total material quantity in the last row need to be written. Finally save the data and get the following results:

16c6e69a88c64ef097e165c946250477.png

3. Past recommendations

Python extracts tabular data from pdf (with practical case)

Automatically send emails using Python

Python basics of operating ppt and pdf

Python operation word basics

Python basics of operating excel

Use Python to extract tables from PDF to Excel with one click

4. Recommendations and benefits at the end of the article

3 copies of “Python Automated Office Applications (ChatGPT Version)” are given away for free!

8e5d7939a25f465ea349e45a84cd36a0.jpeg

Introduction:

This book comprehensively and systematically introduces the automation solutions of Python language in common office scenarios. The book is divided into 5 parts and 21 chapters, including the basic knowledge of Python language, common methods of reading and writing data in Python, using Python to automatically operate Excel, using Python to automatically operate Word and PPT, and using Python to automatically operate files and folders, emails, PDF files, Pictures, videos, use Python for data visualization analysis and web page interaction, and use ChatGPT to easily advance to Python office automation.
This book is suitable for information workers of all levels. It can be used as an introductory guide for beginners to learn Python, and as a reference manual for intermediate and advanced automated office users. The large number of examples in the book are also suitable for readers to use for reference directly in their work.

Editor’s recommendation:

Easily realize office automation with the help of ChatGPT and Python: Excel Home is created by many Microsoft global MVP experts. It uses a large number of examples to introduce the use of Python to operate Excel, Word, PPT and various objects involved in daily office work. Let ordinary office workers without programming experience be able to control Python, realize office automation in multiple scenarios, and improve work efficiency!

  • Lucky draw method: 3 friends will be randomly selected from the comment area and given away for free!
  • How to participate: Follow the blogger, like, collect, and comment in the comment area “Life is too short, refuse to get involved!” (Remember to like + collect, otherwise the draw will be invalid, and each person can comment up to three times!)
  • Event deadline: 2023-09-28 20:00:00
  • Dangdang purchase link: http://product.dangdang.com/29566495.html

  • JD.com purchase link: https://item.jd.com/13953308.html

List announcement time: 2023-09-28 21:00:00

a17756265cb34b76964cdb85cda99811.png