Usually when I do data analysis in the company, I also use python to do some office automation work. The leader said yesterday that we can finish other people’s 3-hour work in 3 minutes. O(∩_∩)O~
This article will introduce to you a few office automation skills I use:
1. Word document doc to docx
Last year, I wanted to participate in a data competition. The data in it were all in doc format. I wanted to use python-docx to read the data in word files, but python-docx only supports docx format, so I studied the conversion of these two formats.
1.1 Import Toolkit
import osfrom win32com import client as wc
1.2 Obtain the details of all doc files under the folder
# path path=C:/Users/yyz/Desktop/python office skills/data/doc to docx/ # Modify according to your own computer files # Define an empty list to store the absolute path of the file files = [] for file in os.listdir(path): if file.endswith(.doc): files.append(path + file) files
1.3 Converting files
# Run word program word = wc.Dispatch("Word.Application") # for loop i = 0 for file in files: try: doc = word.Documents.Open(file) #Open word file doc.SaveAs("{}x".format(file), 12) #Save as a file with the suffix ".docx", where parameter 12 refers to the docx file doc.Close() #Close the original word file print(file + ': converted successfully') i + =1 except: print(file + ': conversion [unsuccessful]') files.append(file) # If an error is reported while reading the file, add the file name to the files list and read again pass print('convert file %i'%i) # exit word word. Quit()
2. Convert text addresses to latitude and longitude in batches
During work, the latitude and longitude of the address will be used for map visualization or distance calculation.
2.1 Import Toolkit
# import toolkit import pandas as pd import json from urllib. request import urlopen, quote import requests
2.2 Define the conversion function
# define function def getlnglat(address): url = 'http://api.map.baidu.com/geocoding/v3/' output = 'json' ak = "self-applied api" # Baidu map API, need to apply by yourself address = quote(address) # Since the address variable in this article is in Chinese, in order to prevent garbled characters, first use quote to encode uri = url + '?' + 'address=' + address + ' & amp;output=' + output + ' & amp;ak=' + ak + ' & amp;callback= showLocation ' + '//GET request' res=requests.get(uri).text temp = json.loads(res) # convert string to json lat = temp['result']['location']['lat'] lng = temp['result']['location']['lng'] return lng, lat # longitude longitude, latitude latitude,
2.3 Address translation
2.3.1 Single address translation
# single address translation getlnglat('Gaobeidian District Office, Chaoyang District, Beijing Gaojing Village Committee') (116.52784003604923, 39.91806508560947)
2.3.2 Batch Address Translation
# read data data = pd.read_excel('C:/Users/yyz/Desktop/python office skills/data/address information.xlsx') data
data['longitude'] = '' data['latitude'] = '' for i in range(data. shape[0]): try: data.iloc[i,2] = getlnglat(data.iloc[i,1])[0] # Longitude Convert the address of row i and column 2 (column index is 1) into latitude and longitude, and assign the longitude to Row i, column 3 (column index is 2) data.iloc[i,3] = getlnglat(data.iloc[i,1])[1] # latitude except: pass #print(i) data
3. Calculate the distance by latitude and longitude
installation kit
pip install geopy` `
3.1 Import Toolkit
from geopy.distance import geodesic
3.2 Read data
# read data data = pd.read_excel('C:/Users/yyz/Desktop/python office skills/data/latitude and longitude calculation distance.xlsx') data
3.3 Calculate the distance
# Assign latitude and longitude to variables, simplify wd1 = data['latitude 1'].tolist() jd1 = data['longitude 1'].tolist() wd2 = data['latitude 2'].tolist() jd2 = data['longitude 2'].tolist() lis1 = [] for i in range(len(data)): j= geodesic((wd1[i],jd1[i]), (wd2[i],jd2[i])).km # latitude longitude latitude longitude lis1.append(j) #print(i) data['distance'] = lis1 data
4. Convert Baidu longitude and latitude to Gaode longitude and latitude
The company has two systems, the coordinate systems used are different, sometimes it needs to be converted
4.1 Toolkit
# import toolkit import math import pandas as pd
4.2 Defining functions
# define conversion function def bdToGaoDe(lon,lat): PI = 3.14159265358979324 * 3000.0 / 180.0 x = lon - 0.0065 y = lat - 0.006 z = math. sqrt(x * x + y * y) - 0.00002 * math. sin(y * PI) theta = math.atan2(y, x) - 0.000003 * math.cos(x * PI) lon = z * math.cos(theta) lat = z * math. sin(theta) return lon,lat
4.3 Single Transformation
# single conversion bdToGaoDe(116.512885, 39.847469) (116.50647396357492, 39.84120409781157)
4.4 Batch conversion
# read data data = pd.read_excel('C:/Users/yyz/Desktop/python office skills/data/Baidu longitude and latitude to Gaode.xlsx') data. head()
wd = data['latitude'].tolist() jd = data['longitude'].tolist() # define an empty list li1 = [] for i in range(len(data)): j = bdToGaoDe(jd[i],wd[i]) li1.append(j) li1 data['longitude_re'] = [i[0] for i in li1] data['latitude_re'] = [i[1] for i in li1] data. head()
5. Excel file batch merge
5.1 Toolkit
# import toolkit import pandas as pdimport os
5.2 Get file list
# set file path path = 'C:/Users/yyz/Desktop/python office skills/data/data merge/' # Empty list, used to store file paths files = [] for file in os.listdir(path): if file.endswith(.xlsx): files.append(path + file) # view list files
5.3 Convert storage data
# define an empty dataframe data = pd. DataFrame() # Loop through all files for file in files: datai = pd. read_excel(file) datai_len = len(datai) data = data.append(datai) # add to the total data print('Read %i row data, merged file %i column, name: %s'%(datai_len,len(data.columns),file.split('/')[-1]) ) # Check if all are read and whether the format is wrong # reset index data.reset_index(drop=True, inplace=True)
6. Convert Word files to pdf in batches
Only docx files can be converted, and an error will be reported when converting doc files. Toolkit installation
pip install docx2pdf
6.1 Import Toolkit
# Install Toolkit:``# Import Toolkit``from docx2pdf import convert``import os
6.2 Single Transformation
# Single conversion``convert("c:/users/yyz/desktop/Rubiccube formula.docx", "c:/users/yyz/desktop/excel notes.pdf")
6.3 Batch conversion?
# file location path = 'C:/Users/yyz/Desktop/python office skills/data/word to pdf/' # Define an empty list to store the file list files = [] for file in os.listdir(path): if file.endswith(".docx"): files.append(path + file) files for file in files: convert(file,file.split('.')[0] + '.pdf') print(file + 'converted successfully')
7. Batch read table data in word
toolkit installation
pip install python-docx
import docx # read word file doc = docx.Document('C:/Users/yyz/Desktop/python office skills/data/word information.docx') # Get a list of all table objects in the document biauges = doc.tables
7.2 Irregular forms
cells = biauges[1]._cells``cells_lis = [[cell.text for cell in cells]]
import pandas as pd import numpy as np datai = pd. DataFrame(cells_lis) datai = datai[[1,3,7,9,14,16,19,21]] datai.columns = ['Name','Age','Native Place','Address','Work Unit','Phone','Whether a Party Member',\ 'date of birth'] datai
7.3 Canonical data
# Get the first table row丨 rowi = len(biaoges[0].rows) rowi
# define an empty list lis1 = [] # for loop to get the data of the first table for i in range(1,rowi): # loop from row 2 lis1.append([biaoges[0].cell(i,0).text, biauges[0].cell(i,1).text, biauges[0].cell(i,2).text, biauges[0].cell(i,3).text, biauges[0].cell(i,4).text])
# create a dataframe data1 = pd.DataFrame(lis1,columns=['date','category','quantity','price','amount']) data1
7.4 Batch read
import pandas as pd import os os.chdir('C:/Users/yyz/Desktop/python office skills/data/word information/')
lis1=[] for file in os.listdir('.'): if file.endswith('.docx'): doc = docx. Document('./' + file) biauges = doc.tables rowi = len(biaoges[0].rows) for i in range(1,rowi): lis1.append([biaoges[0].cell(i,0).text, biauges[0].cell(i,1).text, biauges[0].cell(i,2).text, biauges[0].cell(i,3).text, biauges[0].cell(i,4).text])
# create dataframe data1 = pd.DataFrame(lis1,columns=['date','category','quantity','price','amount']) data1
8 Use Outlook to send emails in batches
8.1 Import Toolkit
import win32com.client as win32 import pandas as pd
8.2 Reading data
# read data data1 = pd.read_excel('C:/Users/yyz/Desktop/python send mail in batches.xlsx', sheet_name='send mail') data1.fillna('',inplace=True)
8.3 Send email
# run outlook outlook = win32.Dispatch("outlook.Application") # for loop to send files for i in range(data1. shape[0]): mail = outlook.CreateItem(0) # Create a mail object win32.constants.olMailItem mail.To = data1.iloc[i,0] #recipient mail.CC = data1.iloc[i,1] #cc person mail.Subject = data1.iloc[i,2] #Mail subject mail.HTMLBody = data1.iloc[i,3] # Email body in html format # mail.Body = data1.iloc[i,3] # mail body mail.Attachments.Add(data1.iloc[i,4]) # attachment mail.Send() #send i + =1 print('send mail %i copies'%i)
There are still many skills in python office automation. Python is easy to master and can help us improve work efficiency. This is one of the reasons why many non-programmers learn python.
Reader benefits: If you also like programming and want to get a higher salary by learning Python, then this set of Python learning materials must be useful to you!
For 0 basic beginners:
If you are a zero-based novice, you can consider getting started with Python quickly
On the one hand, the learning time is relatively short, and the learning content is more comprehensive and concentrated
On the other hand, you can find a learning plan that suits you
Including: Python installation package + activation code, Python web development, Python crawler, Python data analysis, artificial intelligence, machine learning and other tutorials. Take you to learn Python systematically from zero foundation!
Reader Benefits: 2023 Zero-Basic Python Essentials (Video + Source Code + Tools + Software) Safe Links for Free Collection
1. Learning routes in all directions of Python
The route of all directions in Python is to organize the commonly used technical points of Python to form a summary of knowledge points in various fields. Its usefulness lies in that you can find corresponding learning resources according to the above knowledge points to ensure that you learn more comprehensively.
Second, learning software
If a worker wants to do a good job, he must first sharpen his tools. The commonly used development software for learning Python is here, which saves you a lot of time.
3. Introductory learning video
When we watch videos and learn, we can’t just move our eyes and brain without using our hands. A more scientific learning method is to use them after understanding. At this time, the hands-on project is very suitable.
Fourth, actual combat cases
Optical theory is useless, you have to learn to follow along, and you have to do it yourself, so that you can apply what you have learned to practice. At this time, you can learn from some actual combat cases.
Five, 100 Python practice questions
Check the learning results.
6. Interview materials
We must learn Python to find high-paying jobs. The following interview questions are the latest interview materials from first-line Internet companies such as Ali, Tencent, and Byte, and Ali bosses have given authoritative answers. After finishing this set The interview materials believe that everyone can find a satisfactory job.
This full version of the full set of Python learning materials has been uploaded. If you need it, you can scan the CSDN official certification QR code below or click the link to get it for free【 Guaranteed 100% Free
】
Reader benefits: 2023 zero-basic Python essential materials (video + source code + tools + software) secure link for free