Excel automated office-basic use of Openpyxl
Personally, I feel that compared to Pandas, openpyxl is more detailed in operating Excel, while Pandas is more suitable for statistical calculations;
- 01 Basic environment
- 02 Excel data reading operation
- 03 case
- 04 Write data to Excel
- 05 Directed modification of table data
- 06 Cell style formulation
- 07 Cell border style formulation
- 08 Cell alignment method
- 09 filter
- 10 formula operations
- 11 Set row height and column width
- 12 cells split and merge
- 13 Freeze window
- 14 Add drawing chart
01 Basic environment
Library used: openpyxl (3.x), dependent on lxml, pillow;
Some basic definitions:
- Workbook: an Excel spreadsheet document saved to a file with the extension
.xlsx
; - sheet: multiple worksheets contained in the workbook;
- Active table: The currently viewed table, or the last table viewed before closing;
- Cell: The column starts from A, the row starts from 1, and the position square of a specific row and column is the cell; note that after Z, use two letters AA, AB, AC, etc.;
import openpyxl openpyxl.__version__
'3.0.9'
02 Excel data reading operation
# Load the workbook and return the workbook object wb = openpyxl.load_workbook("./youtube_mp3_ds_test.xlsx") # Get the list of worksheet names print("sheet name list = ",wb.sheetnames) # Get sheet and return worksheet object ws1 = wb["Sheet1"] # Get the active sheet and return the worksheet object ws2 = wb.active # Get cells from the table, access them by row and column names, and return the Cell object cell = ws2["A4"] # Attribute: value: the value stored in the cell # Attribute: row: row index # Attribute: column: column index # Attribute: coordinate: coordinate print("Cell attributes = ",cell.value, cell.row, cell.column, cell.coordinate) # Row indexes can use numbers instead of letters, starting from 1, but the method needs to be changed; cell = ws2.cell(row=1, column=4) print("Cell attributes (index mode) = ",cell.value, cell.row, cell.column, cell.coordinate) # Get a rectangular area from the worksheet for cell_row in ws2["A2":"E6"]: for cell in cell_row: # print(cell. coordinate) pass # You can also get a single row or column: use the rows and columns properties of the worksheet object, and return an iterator for columns in ws2.columns: for cell in column: print("The first row of each column = ",cell.coordinate) break # Get all the cells in the first column list(ws2. columns)[0] # Get the maximum number of rows and columns in the worksheet print("The maximum number of rows and columns in the worksheet = ",ws2.max_row,ws2.max_column)
List of sheet names = ['Sheet1'] Cell properties = 114584 4 1 A4 Cell attributes (index mode) = a_item 1 4 D1 First row of each column = A1 First row of each column = B1 First row of each column = C1 First row of each column = D1 First row of each column = E1 First row of each column = F1 First row of each column = G1 First row of each column = H1 Maximum number of rows and columns in a worksheet = 12567 8
03 case
# Four columns: census tract number A, state abbreviation B, county name C, census tract population D; each county may have multiple census tracts; # code hint sheet=... countryData = {<!-- -->} for row in range(2, sheet. max_row + 1): cell_1 = sheet["B" + str(row)] cell_2 = sheet["C" + str(row)] cell_3 = sheet["D" + str(row)] countryData.setdefault(cell_1.value, {<!-- -->}) countryData[cell_1.value].setdefault(cell_2.value, {<!-- -->'v1':0, 'v2':0}) countryData[cell_1.value][cell_2.value]['v1'] + = 1 countryData[cell_1.value][cell_2.value]['v2'] + = int(cell_3.value) # Results can be saved to json
04 Write data to Excel
# Create an empty workbook wb = openpyxl. Workbook() wb.sheetnames # default worksheet
['Sheet']
ws = wb.active # Set a name for the worksheet ws.title = "test_sheet" ws.title
'test_sheet'
# save workbook wb.save("./wb_demo.xlsx")
# create worksheet wb = openpyxl.load_workbook("./wb_demo.xlsx") wb.create_sheet(title="test_sheetA", index=0) # Specify the position where the sheet is placed (the value starts from 0) wb.save("./wb_demo.xlsx") wb.sheetnames
['test_sheetA', 'test_sheet']
# Delete worksheet del wb["test_sheetA"] wb.sheetnames
['test_sheet']
# Specify the value of the cell sheet = wb["test_sheet"] sheet["A1"] = "ColumnA" sheet["B1"] = "ColumnB" sheet.cell(row=2, column=1).value = "ValueA" sheet.cell(row=2, column=2).value = "ValueB" wb.save("./wb_demo.xlsx")
05 Directed modification of table data
# Code description wb = openpyxl.load_workbook("./xxx.xlsx") sheet = wb["Sheet"] PRICE_UPDATE = {<!-- --> "A": 1.0, "B": 2.0, "C": 3.0 } for rowNum in range(2, sheet.max_row + 1): cur_name = sheet.cell(row=rowNum, column=1).value if cur_name in PRICE_UPDATE: sheet.cell(row=rowNum, column=2).value = PRICE_UPDATE[cur_name] # save update wb
06 Cell style development
Use the openpyxl.stayles module to import the Font and PatternFill toolkits;
font
- name = “italics”, name
- color=”000000″, color
- italic=Ture, italic
- size=12, font size
- underline=”sigle”, single underline
- b=True, bold
# Set the font style import openpyxl from openpyxl. styles import Font wb = openpyxl.load_workbook("./wb_demo.xlsx") sheet = wb["test_sheet"] for row in sheet["A1":"B1"]: for col in row: col.font = Font(name="italics") wb.save("./wb_demo.xlsx")
# Set font color wb = openpyxl.load_workbook("./wb_demo.xlsx") sheet = wb["test_sheet"] for row in sheet["A1":"B1"]: for col in row: col.font = Font(name="楷体",color="668B8B") wb.save("./wb_demo.xlsx")
fill color
# Set cell fill color import openpyxl from openpyxl. styles import PatternFill wb = openpyxl.load_workbook("./wb_demo.xlsx") sheet = wb["test_sheet"] for row in sheet["A2":"B2"]: for col in row: col.fill = PatternFill(patternType="solid",fgColor="8470FF") wb.save("./wb_demo.xlsx")
07 Cell border style formulation
# Set the border (not commonly used) import openpyxl from openpyxl. styles import Side, Border wb = openpyxl.load_workbook("./wb_demo.xlsx") sheet = wb["test_sheet"] for row in sheet["A2":"B2"]: for col in row: side = Side(style='double', color='FF0000') col.border = Border(top=side,bottom=side, left=side, right=side) wb.save("./wb_demo.xlsx")
08 cell alignment
# Use the Alignment toolkit import openpyxl from openpyxl. styles import Alignment wb = openpyxl.load_workbook("./wb_demo.xlsx") sheet = wb["test_sheet"] for row in sheet["A2":"B2"]: for col in row: col.alignment = Alignment(horizontal="right",vertical='center') wb.save("./wb_demo.xlsx")
09 filter
import openpyxl wb = openpyxl.load_workbook("./wb_demo.xlsx") sheet = wb["test_sheet"] # Set filter: The filter object needs to specify the reference area sheet.auto_filter.ref = 'A1:B3' # Set filter items: Parameter 1 specifies which column to apply the filter conditions to, parameter 2 indicates the content of the filter conditions sheet.auto_filter.add_filter_column(0, ["ValueA"]) # But note that the data display style needs to open Excel and operate the filter before it will take effect, which is equivalent to a bug. # Set sorting: specify the sorting area, and ascending and descending order sheet.auto_filter.add_sort_condition(ref="B2:B3", descending=True) # The above bug still exists: the configuration already exists, but the style does not take effect. wb.save("./wb_demo.xlsx")
10 formula operation
import openpyxl wb = openpyxl.load_workbook("./wb_demo.xlsx") sheet = wb["test_sheet"] sheet['C4'] = '=SUM(C2:C3)' wb.save("./wb_demo.xlsx")
# read the result of the cell using the formula wb = openpyxl.load_workbook("./wb_demo.xlsx",read_only=True) sheet = wb["test_sheet"] sheet['C4'].value # Note that this is a formula string, not a calculated result
'=SUM(C2:C3)'
11 Set row height and column width
The sheet object has row_dimensions and column_demensions attributes to control row height and column width;
- Integer or decimal ranging from 0 to 409 for line height, representing the number of points, the default line height is 12.75, and one point is equal to 1/72 inch;
- The column width can be set as an integer or decimal from 0 to 255, and the default is 8.43 characters;
- Setting the row height or column width to 0 will make the cell hidden;
import openpyxl wb = openpyxl.load_workbook("./wb_demo.xlsx") sheet = wb["test_sheet"] sheet.row_dimensions[2].height = 50 # set row 2 row height sheet.column_dimensions['A'].width = 50 # Set column A column width wb.save("./wb_demo.xlsx")
12 cell splitting and merging
- merge_cells() merges cells in a rectangular area into one cell
- unmerge_cells() for splitting cells
import openpyxl wb = openpyxl.load_workbook("./wb_demo.xlsx") sheet = wb["test_sheet"] sheet.merge_cells('A2:B3') sheet['A2'] = "Merged content" # sheet. unmerge_cells('A2:B3') wb.save("./wb_demo.xlsx")
13 Freeze window
- Hover the top rows or leftmost columns;
- The freeze_panes property of the sheet object can be set to a Cell object or a cell coordinate string;
- All rows above the cell and all columns to the left will be frozen, but the row and column where the cell is located will not be frozen;
Attribute value | Affected rows and columns |
---|---|
=’A2′ | Row 1 |
=’B1′ | Column A |
=’C1′ | Column A and Column B |
=’C2′ | Row 1 and Column A and Column B |
=’A1′ or =None | Unfreeze all cells |
14 Add drawing chart
Create bar, line, scatter, and pie charts from data in a worksheet:
- Create a Reference object that represents the data area that acts on the chart;
- Create chart object
- Add data to the chart object
- Add the chart to the specified sheet
48 chart styles reference in python_openpyxl
Python office automation – use openpyxl to make Excel charts
import openpyxl from openpyxl.chart.legend import LegendEntry wb = openpyxl.load_workbook("./youtube_mp3_ds_test.xlsx") sheet = wb. active # Only delineate the data area values = openpyxl.chart.Reference(sheet, min_row=2, min_col=2, max_row=5, max_col=3) # chart = openpyxl.chart.BarChart() chart = openpyxl.chart.LineChart() # chart = openpyxl.chart.ScatterChart() # chart = openpyxl.chart.PieChart() chart.title = 'Bar for cls' chart.x_axis.title = 'cls' chart.y_axis.title = 'num' # The position of the legend can be controlled by setting its position: # Right, left, top, bottom and top right are r, l, t, b and tr respectively. The default value is r # chart.legend.position = 'tr' ## chart.add_data(values) ## Set the legend (note that setting data in the following ways cannot be shared with "chart.add_data(values)" ## Otherwise, the new legend will coexist with the default legend generated by "chart.add_data(values)") for index in range(2,4): name = sheet.cell(row=1, column=index).value seriesObj = openpyxl.chart.Series(list(values.cols)[index-2], title=name) chart.append(seriesObj) # Use column 1 as the x-axis to display (needs to be set after add_data) x_label = openpyxl.chart.Reference(sheet, min_row=2, min_col=1, max_row=5, max_col=1) chart.set_categories(x_label) sheet.add_chart(chart, 'I1') # I1 represents the cell where the placement is wb.save("./youtube_mp3_ds_test1.xlsx")