Excel automated office-basic use of Openpyxl

Excel automated office-basic use of Openpyxl

Personally, I feel that compared to Pandas, openpyxl is more detailed in operating Excel, while Pandas is more suitable for statistical calculations;

  • 01 Basic environment
  • 02 Excel data reading operation
  • 03 case
  • 04 Write data to Excel
  • 05 Directed modification of table data
  • 06 Cell style formulation
  • 07 Cell border style formulation
  • 08 Cell alignment method
  • 09 filter
  • 10 formula operations
  • 11 Set row height and column width
  • 12 cells split and merge
  • 13 Freeze window
  • 14 Add drawing chart

01 Basic environment

Library used: openpyxl (3.x), dependent on lxml, pillow;

Some basic definitions:

  • Workbook: an Excel spreadsheet document saved to a file with the extension .xlsx;
  • sheet: multiple worksheets contained in the workbook;
  • Active table: The currently viewed table, or the last table viewed before closing;
  • Cell: The column starts from A, the row starts from 1, and the position square of a specific row and column is the cell; note that after Z, use two letters AA, AB, AC, etc.;
import openpyxl
openpyxl.__version__
'3.0.9'

02 Excel data reading operation

# Load the workbook and return the workbook object
wb = openpyxl.load_workbook("./youtube_mp3_ds_test.xlsx")

# Get the list of worksheet names
print("sheet name list = ",wb.sheetnames)

# Get sheet and return worksheet object
ws1 = wb["Sheet1"]

# Get the active sheet and return the worksheet object
ws2 = wb.active

# Get cells from the table, access them by row and column names, and return the Cell object
cell = ws2["A4"]

# Attribute: value: the value stored in the cell
# Attribute: row: row index
# Attribute: column: column index
# Attribute: coordinate: coordinate
print("Cell attributes = ",cell.value, cell.row, cell.column, cell.coordinate)

# Row indexes can use numbers instead of letters, starting from 1, but the method needs to be changed;
cell = ws2.cell(row=1, column=4)
print("Cell attributes (index mode) = ",cell.value, cell.row, cell.column, cell.coordinate)


# Get a rectangular area from the worksheet
for cell_row in ws2["A2":"E6"]:
    for cell in cell_row:
# print(cell. coordinate)
        pass

# You can also get a single row or column: use the rows and columns properties of the worksheet object, and return an iterator
for columns in ws2.columns:
    for cell in column:
        print("The first row of each column = ",cell.coordinate)
        break

# Get all the cells in the first column
list(ws2. columns)[0]

# Get the maximum number of rows and columns in the worksheet
print("The maximum number of rows and columns in the worksheet = ",ws2.max_row,ws2.max_column)


List of sheet names = ['Sheet1']
Cell properties = 114584 4 1 A4
Cell attributes (index mode) = a_item 1 4 D1
First row of each column = A1
First row of each column = B1
First row of each column = C1
First row of each column = D1
First row of each column = E1
First row of each column = F1
First row of each column = G1
First row of each column = H1
Maximum number of rows and columns in a worksheet = 12567 8

03 case

# Four columns: census tract number A, state abbreviation B, county name C, census tract population D; each county may have multiple census tracts;

# code hint
sheet=...
countryData = {<!-- -->}
for row in range(2, sheet. max_row + 1):
    cell_1 = sheet["B" + str(row)]
    cell_2 = sheet["C" + str(row)]
    cell_3 = sheet["D" + str(row)]
    
    countryData.setdefault(cell_1.value, {<!-- -->})
    countryData[cell_1.value].setdefault(cell_2.value, {<!-- -->'v1':0, 'v2':0})
    countryData[cell_1.value][cell_2.value]['v1'] + = 1
    countryData[cell_1.value][cell_2.value]['v2'] + = int(cell_3.value)

# Results can be saved to json

04 Write data to Excel

# Create an empty workbook
wb = openpyxl. Workbook()

wb.sheetnames # default worksheet

['Sheet']
ws = wb.active

# Set a name for the worksheet
ws.title = "test_sheet"
ws.title

'test_sheet'
# save workbook
wb.save("./wb_demo.xlsx")

# create worksheet
wb = openpyxl.load_workbook("./wb_demo.xlsx")
wb.create_sheet(title="test_sheetA", index=0) # Specify the position where the sheet is placed (the value starts from 0)
wb.save("./wb_demo.xlsx")
wb.sheetnames

['test_sheetA', 'test_sheet']
# Delete worksheet
del wb["test_sheetA"]
wb.sheetnames

['test_sheet']
# Specify the value of the cell
sheet = wb["test_sheet"]
sheet["A1"] = "ColumnA"
sheet["B1"] = "ColumnB"
sheet.cell(row=2, column=1).value = "ValueA"
sheet.cell(row=2, column=2).value = "ValueB"
wb.save("./wb_demo.xlsx")

05 Directed modification of table data

# Code description
wb = openpyxl.load_workbook("./xxx.xlsx")
sheet = wb["Sheet"]

PRICE_UPDATE = {<!-- -->
    "A": 1.0,
    "B": 2.0,
    "C": 3.0
}
for rowNum in range(2, sheet.max_row + 1):
    cur_name = sheet.cell(row=rowNum, column=1).value
    if cur_name in PRICE_UPDATE:
        sheet.cell(row=rowNum, column=2).value = PRICE_UPDATE[cur_name]

# save update wb

06 Cell style development

Use the openpyxl.stayles module to import the Font and PatternFill toolkits;

font

  • name = “italics”, name
  • color=”000000″, color
  • italic=Ture, italic
  • size=12, font size
  • underline=”sigle”, single underline
  • b=True, bold
# Set the font style
import openpyxl
from openpyxl. styles import Font

wb = openpyxl.load_workbook("./wb_demo.xlsx")
sheet = wb["test_sheet"]

for row in sheet["A1":"B1"]:
    for col in row:
        col.font = Font(name="italics")
wb.save("./wb_demo.xlsx")

# Set font color
wb = openpyxl.load_workbook("./wb_demo.xlsx")
sheet = wb["test_sheet"]

for row in sheet["A1":"B1"]:
    for col in row:
        col.font = Font(name="楷体",color="668B8B")
wb.save("./wb_demo.xlsx")

fill color

# Set cell fill color
import openpyxl
from openpyxl. styles import PatternFill

wb = openpyxl.load_workbook("./wb_demo.xlsx")
sheet = wb["test_sheet"]

for row in sheet["A2":"B2"]:
    for col in row:
        col.fill = PatternFill(patternType="solid",fgColor="8470FF")
wb.save("./wb_demo.xlsx")

07 Cell border style formulation

# Set the border (not commonly used)
import openpyxl
from openpyxl. styles import Side, Border

wb = openpyxl.load_workbook("./wb_demo.xlsx")
sheet = wb["test_sheet"]

for row in sheet["A2":"B2"]:
    for col in row:
        side = Side(style='double', color='FF0000')
        col.border = Border(top=side,bottom=side, left=side, right=side)
        
wb.save("./wb_demo.xlsx")

08 cell alignment

# Use the Alignment toolkit
import openpyxl
from openpyxl. styles import Alignment

wb = openpyxl.load_workbook("./wb_demo.xlsx")
sheet = wb["test_sheet"]

for row in sheet["A2":"B2"]:
    for col in row:
        col.alignment = Alignment(horizontal="right",vertical='center')
        
wb.save("./wb_demo.xlsx")


09 filter

import openpyxl

wb = openpyxl.load_workbook("./wb_demo.xlsx")
sheet = wb["test_sheet"]

# Set filter: The filter object needs to specify the reference area
sheet.auto_filter.ref = 'A1:B3'

# Set filter items: Parameter 1 specifies which column to apply the filter conditions to, parameter 2 indicates the content of the filter conditions
sheet.auto_filter.add_filter_column(0, ["ValueA"]) # But note that the data display style needs to open Excel and operate the filter before it will take effect, which is equivalent to a bug.

# Set sorting: specify the sorting area, and ascending and descending order
sheet.auto_filter.add_sort_condition(ref="B2:B3", descending=True) # The above bug still exists: the configuration already exists, but the style does not take effect.

wb.save("./wb_demo.xlsx")

10 formula operation

import openpyxl

wb = openpyxl.load_workbook("./wb_demo.xlsx")
sheet = wb["test_sheet"]
sheet['C4'] = '=SUM(C2:C3)'
wb.save("./wb_demo.xlsx")

# read the result of the cell using the formula
wb = openpyxl.load_workbook("./wb_demo.xlsx",read_only=True)
sheet = wb["test_sheet"]
sheet['C4'].value # Note that this is a formula string, not a calculated result

'=SUM(C2:C3)'

11 Set row height and column width

The sheet object has row_dimensions and column_demensions attributes to control row height and column width;

  • Integer or decimal ranging from 0 to 409 for line height, representing the number of points, the default line height is 12.75, and one point is equal to 1/72 inch;
  • The column width can be set as an integer or decimal from 0 to 255, and the default is 8.43 characters;
  • Setting the row height or column width to 0 will make the cell hidden;
import openpyxl

wb = openpyxl.load_workbook("./wb_demo.xlsx")
sheet = wb["test_sheet"]

sheet.row_dimensions[2].height = 50 # set row 2 row height
sheet.column_dimensions['A'].width = 50 # Set column A column width

wb.save("./wb_demo.xlsx")

12 cell splitting and merging

  • merge_cells() merges cells in a rectangular area into one cell
  • unmerge_cells() for splitting cells
import openpyxl

wb = openpyxl.load_workbook("./wb_demo.xlsx")
sheet = wb["test_sheet"]

sheet.merge_cells('A2:B3')
sheet['A2'] = "Merged content"

# sheet. unmerge_cells('A2:B3')

wb.save("./wb_demo.xlsx")

13 Freeze window

  • Hover the top rows or leftmost columns;
  • The freeze_panes property of the sheet object can be set to a Cell object or a cell coordinate string;
  • All rows above the cell and all columns to the left will be frozen, but the row and column where the cell is located will not be frozen;
Attribute value Affected rows and columns
=’A2′ Row 1
=’B1′ Column A
=’C1′ Column A and Column B
=’C2′ Row 1 and Column A and Column B
=’A1′ or =None Unfreeze all cells

14 Add drawing chart

Create bar, line, scatter, and pie charts from data in a worksheet:

  • Create a Reference object that represents the data area that acts on the chart;
  • Create chart object
  • Add data to the chart object
  • Add the chart to the specified sheet

48 chart styles reference in python_openpyxl

Python office automation – use openpyxl to make Excel charts

import openpyxl
from openpyxl.chart.legend import LegendEntry

wb = openpyxl.load_workbook("./youtube_mp3_ds_test.xlsx")
sheet = wb. active

# Only delineate the data area
values = openpyxl.chart.Reference(sheet, min_row=2, min_col=2, max_row=5, max_col=3)

# chart = openpyxl.chart.BarChart()
chart = openpyxl.chart.LineChart()
# chart = openpyxl.chart.ScatterChart()
# chart = openpyxl.chart.PieChart()
chart.title = 'Bar for cls'
chart.x_axis.title = 'cls'
chart.y_axis.title = 'num'
# The position of the legend can be controlled by setting its position:
# Right, left, top, bottom and top right are r, l, t, b and tr respectively. The default value is r
# chart.legend.position = 'tr'

## chart.add_data(values)

## Set the legend (note that setting data in the following ways cannot be shared with "chart.add_data(values)"
## Otherwise, the new legend will coexist with the default legend generated by "chart.add_data(values)")
for index in range(2,4):
    name = sheet.cell(row=1, column=index).value
    seriesObj = openpyxl.chart.Series(list(values.cols)[index-2], title=name)
    chart.append(seriesObj)

# Use column 1 as the x-axis to display (needs to be set after add_data)
x_label = openpyxl.chart.Reference(sheet, min_row=2, min_col=1, max_row=5, max_col=1)
chart.set_categories(x_label)

sheet.add_chart(chart, 'I1') # I1 represents the cell where the placement is
wb.save("./youtube_mp3_ds_test1.xlsx")