EXCEL tool–openpyxl

A tile of clear snow feels like winter, and half a plant of lingering fragrance awaits tomorrow’s summer. The same goes for my heart that burns through the long night, and the light feathers fall down to seal my love. (I’m not good at it, sorry)

Software: Jupyter NoteBook3
Language: Python
Author: Lonely Beauty Jade Rose
Time: The second period of frost: stinging insects

1. Get to know Excel

2. About openpyxl

3. Common operations of openpyxl

3.1 Create a new table
3.2 Operating existing files
1. Get to know Excel

Excel is an important part of the office software Office launched by Microsoft, and it is also one of the most popular spreadsheet processing software currently. It has powerful calculation, analysis and chart functions, and is currently the most commonly used office data spreadsheet software in companies. The built-in formulas and functions in Excel may help users perform complex calculations; because Excel has powerful functions in data calculations, it has become an indispensable common office software for users.

Commonly used file types in Excel

  • xls file: mainly the file format before 2007;
  • xlsx file: file format after 2007;
  • xlsm file: workspace file (macro file), used to save the current working status;
  • xlsb file: binary file, used to store files, which is small in size and easy to read and write (for computers);
  • csv file: A file where comma separated data is saved.

With the development of society, our workload is increasing, and the complexity of our work is gradually increasing. Doing repetitive and tedious things every day is not only a blow to our work confidence; in the long run, it also consumes Passion for life. How to change this situation, first of all, have a good mentality, and then, separate yourself from the complicated and repetitive work, use an excellent tool to help you achieve all this, and then improve your self-worth.

2. About openpyxl

For details, please refer to the official documentation: https://pypi.org/project/openpyxl/

"""
Introduction
openpyxl is a Python library to read/write Excel 2010 xlsx/xlsm/xltx/xltm files.

It was born from lack of existing library to read/write natively from Python the Office Open XML format.

All kudos to the PHPExcel team as openpyxl was initially based on PHPExcel.
"""

The above is the official document’s introduction to openpyxl. As one of the popular office automation modules, its advantages are also obvious, whether it is reading or writing files. If you do not analyze the current data in EXCEL (pivot, map, group, merge, esc), openpyxl is more suitable than pandas for operating a separate data table.

Note 1: openpyxl cannot operate .xls type files. If you need to operate this type of file, you need to convert it to .xlsx type file in advance. Of course, it can also be converted through python’s own module. The essence is to mobilize the underlying layer of EXCEL to operate. The following is the code.

def trans(oldFilePath,newFilePath):
"""
params: oldFilePath, old file path .xls file format;
newFilePath, new file path .xlsx file format.
"""
#Import required modules
    import win32com.client,os
    #mobilizeEXCEL
    if os.path.exists(oldFilePath):
    excel = win32com.client.gencache.EnsureDispatch('Excel.Application')
    \t#open a file
    wb = excel.Workbooks.Open(oldFilePath)
    #Check the new file path, if it exists, delete it
    if os.path.exists(newFilePath):
        os.remove(newFilePath)
#Save as .xlsx type file.
    wb.SaveAs(newFilePath, FileFormat=51)# 51 represents the xlsx format
    wb.Close()
    #Remember to exit EXCEL, do not idle the thread, it will die easily.
    excel.Application.Quit()
    #Return the new file path
    return newFilePath
    else:
 print("There is no {} for this file, please confirm whether the file exists and try again.".format(oldFilePath))
 return False
3. Common operations of openpyxl

I don’t know how to start, let’s start by creating a new table.

3.1 Create a new table
#os: file path
import os
#Operation EXCEL
import openpyxl
#Module for copying files
import shutil
path = r"C:\Users\weixi\Desktop\\
ew_workbook.xlsx"
#Create a new worksheet, which contains an active sheet, namely: Sheet1.
wb = openpyxl.Workbook()
#get activity table
ws = wb.active
#sheet has many attributes (attributes are not bracketed, and methods need to be adjusted with brackets). We use sheet.title here to get the name of the sheet.
print(ws.title) #output:Sheet
#Save the new worksheet under the path written earlier
wb.save(path)
3.2 Operating existing files
#File path
exist_path = r"C:\Users\weixi\Desktop\data.xlsx"
#Open the worksheet, which can pass many parameters, but I only use the following ones:
#read_only: Default is False. When True, it means read-only and the data cannot be changed;
#data_only: Default is False. When it is True, it means loading as pure data. Cells with formulas will be converted into calculated values and loaded.
wb = openpyxl.load_workbook(exist_path)
wb.active
#output: <Worksheet "Incoming coal testing daily report (batch + sampling)">
#View all sheets in the worksheet
print(wb.sheetnames)
#output: ['Daily report of incoming coal testing (batch + sampling)']
#Open this sheet
ws = wb[wb.sheetnames[0]]

All its properties in EXCEL can be viewed and modified in this module, such as: borders, fills (including gradient fills, pattern fills, solid color fills, etc.), fonts, alignment, number formats, etc., as long as EXCEL can implement it, Then openpyxl can also do it, there are only a few that can’t be used, but they are basically not used (I haven’t encountered it, it should be there).

#Insert row, two parameters, the first indicates the position of the row to be inserted, the second indicates the number of rows to be inserted, one row is inserted by default
ws.insert_rows(ws.max_row)
#Insert column, two parameters, the first indicates the position of the column to be inserted, the second indicates the number of columns to be inserted, one column is inserted by default
ws.delete_cols(4)
#Delete rows, two parameters, the first indicates the position of the row to be deleted, the second indicates the number of rows to be deleted, one row is deleted by default
ws.delete_rows(ws.max_row)

#Delete columns, two parameters, the first indicates the position of the column to be deleted, the second indicates the number of columns to be deleted, one column is deleted by default
ws.delete_cols(4)
#Merge cells, there are two syntaxes. Remember, the cells merged by openpyxl only have read-only attributes. If you need to change the value, you need to cancel the merge first, change the value, and then merge the cells.
ws.merge_cells("A1:R1")
ws.merge_cells(start_row=1,start_column=17,end_row=2,end_column=17)
#Unmerge cells, same as merging cells
ws.unmerge_cells("A1:R1")
ws.unmerge_cells(start_row=1,start_column=17,end_row=2,end_column=17)
#Get the letters of the column corresponding to the current cell
print(ws.cell(1,1).column_letter #output:A
#Two attributes, check the maximum number of rows and columns (written data or formatted cells)
print(ws.max_row,ws.max_column)
#output: 118, 30
#View column width and row height
print(ws.column_dimensions["A"].width,ws.row_dimensions[12].height)
#output: 8.0, 20.25

#View its border information (single cell)
print(ws.cell(1,1).border)
"""
<openpyxl.styles.borders.Border object>
Parameters:
outline=True, diagonalUp=False, diagonalDown=False, start=None, end=None, left=<openpyxl.styles.borders.Side object>
Parameters:
style='thin', color=<openpyxl.styles.colors.Color object>
Parameters:
rgb=None, indexed=None, auto=True, theme=None, tint=0.0, type='auto', right=<openpyxl.styles.borders.Side object>
Parameters:
style='thin', color=<openpyxl.styles.colors.Color object>
Parameters:
rgb=None, indexed=None, auto=True, theme=None, tint=0.0, type='auto', top=<openpyxl.styles.borders.Side object>
Parameters:
style='thin', color=<openpyxl.styles.colors.Color object>
Parameters:
rgb=None, indexed=None, auto=True, theme=None, tint=0.0, type='auto', bottom=<openpyxl.styles.borders.Side object>
Parameters:
style='thin', color=<openpyxl.styles.colors.Color object>
Parameters:
rgb=None, indexed=None, auto=True, theme=None, tint=0.0, type='auto', diagonal=<openpyxl.styles.borders.Side object>
Parameters:
style=None, color=None, vertical=None, horizontal=None
"""
#View cell font
print(ws.cell(1,1).font)
"""
<openpyxl.styles.fonts.Font object>
Parameters:
name='Microsoft Yahei', charset=134, family=2.0, b=False, i=False, strike=None, outline=None, shadow=None, condense=None, color=None, extend=None, sz=12.0, u=None, vertAlign=None, scheme=None
"""
#View cell alignment format
print(ws.cell(1,1).alignment)
"""
<openpyxl.styles.alignment.Alignment object>
Parameters:
horizontal='center', vertical='center', textRotation=0, wrapText=None, shrinkToFit=None, indent=0.0, relativeIndent=0.0, justifyLastLine=None, readingOrder=0.0
"""
print(ws.cell(1,1).fill)
"""
<openpyxl.styles.fills.PatternFill object>
Parameters:
patternType=None, fgColor=<openpyxl.styles.colors.Color object>
Parameters:
rgb='00000000', indexed=None, auto=None, theme=None, tint=0.0, type='rgb', bgColor=<openpyxl.styles.colors.Color object>
Parameters:
rgb='00000000', indexed=None, auto=None, theme=None, tint=0.0, type='rgb'
"""
#View number format
print(ws.cell(1,1).number_format) #output:0
#Determine whether the format exists
ws.cell(1,1).has_style #output: True

Note: In EXCEL, cells are generally determined by rows and columns. The rows are numbers and the columns are letters. For example, “A1” represents the first cell in the upper left corner of the table.

In the table opened by openpyxl, it has two representations:

1), ws.cell(1,1) means “A1”, which is the first cell in the upper left corner. Its starting column and starting row are both 1, with the row in front and the column in the back.
2), ws[“A”][0] means “A1”, which is the first cell. The columns are represented by letters, the rows are represented by numbers, and the starting behavior is 0 (list biased, understood as a list).

#Output the value of the cell
print(ws["A"][0].value)#output:"Daily report of incoming coal testing (2023-10-21 to 2023-10-25)"
print(ws.cell(1,1).value) #output:"Daily report of incoming coal testing (2023-10-21 to 2023-10-25)"

The above are some properties that I need to use in my daily work. Here is an introduction to how to change the properties and cell values.

#The value of the cell is changed directly through assignment. The data will only be changed after saving. If it is closed, the data will not change.
ws["A"][2].value = "gaibian" #Change the value of the cell in the first column of row 2.
ws.cell(3,1).value = "cuoguo" #Change the value of the cell in the first column of row 3.

Changes in cell format are generally a whole, which requires borders, fonts, alignment, etc. The following is a fragment I wrote, using the styles module of openpyxl, see below for details:

def sheetBeautiful(sheet_name):
    
 #Alignment: horizontally centered, vertically centered align=openpyxl.styles.Alignment(horizontal="center",vertical="center")
    #Border line: thin, color is black
    side=openpyxl.styles.Side(style="thin",color="000000")
    
 #Fill: Solid color fill-》The background color is red
    fill=openpyxl.styles.PatternFill(patternType="solid",bgColor="FABF8F",fgColor="FABF8F")
  #Border: Use the previous settings
    border=openpyxl.styles.Border(top=side,left=side,bottom=side,right=side)
    
    font = openpyxl.styles.Font(name=u"楷体",size=12)
    
    if sheet_name.max_row == 5 and sheet_name.cell(4,1).value == "Total":
        
        end_row = 5
        
    else:
        
        end_row = sheet_name.max_row + 1
    
    for i in range(1,end_row):
        
        for j in range(1,17):
            
            if j in range(8,17):
                
                sheet_name.cell(i,j).number_format="0.00"
                
            sheet_name.cell(i,j).alignment=align
            
            sheet_name.cell(i,j).border=border
        #Change number format
        sheet_name.cell(i,15).number_format="0.000"
wb.close()#Changed data, format, etc. will not be saved
wb.save(new_path) #Save the changed data. new_path is the path you want to save. If no path is added, it will be saved to the original path.

More advanced operations are nothing more than cooperating with other modules (numpy, os, shutil, copy) and basic loops, judgments, etc., to organize data, compare data, split tables and other operations.


The above is all the content of this issue. The openpyxl module has more useful methods and attributes. If you want to know more about it, you can check the official documentation.


—-Giving clothes to others in the ninth month, harvesting rice in the tenth month, hiding the dragon in the moon of Jia, and celebrating the good year in the twelfth lunar month. Stay warm, everyone, see you another day. —-