Python Excel file transpose operation

As a test after learning the basic knowledge of Python, I can finally write my own scripts to handle any scenario like RDs, how to write the code elegantly, and then open the advanced version of Python.

As the saying goes, “Quantitative changes lead to qualitative changes.” Once the amount of data is too large, we usually store it through Excel tables. The reason is that the Excel table has the function of database filtering query.

In this issue, we use the Python third-party library openpyxl to process Excel raw data. Let’s increase our knowledge together~

1. Problem background

Following the previous article on Python parsing multimedia files, the content can be output to the specified file, and the file format can be Text or Excel. For example, we parse the multimedia files in a certain directory and input them into an Excel file in the following form:

Through the display of Excel table data content, although you can see the detailed information of multimedia files at a glance, the data is directly scattered, and the corresponding data filtering is very unfavorable… So we need to continue to process the obtained parsed content:

  • Use the names of each multimedia field as ceil_name

  • Convert multimedia file data into row storage

The organized form is roughly as follows:

2. Raw data analysis

2.1 File preparation

  • 1. Original data

    • We use the pymediainfo library to parse multimedia files, and the parsed information is input into the Excel table
    • Multimedia files in the form of Movie, Music and Picture are stored in three sheets respectively.
    • Will get the Excel table data as our origin_workbook
  • 2. Install the openpyxl library

    • openpyxl is a third-party library and needs to be installed using pip before use.
    • pip install openpyxl
      
      
  • 3. Script import form

    • Create the target Excel file: You need to import the Workbook in the openpyxl library

    • from openpyxl import Workbook
      
      
    • Load the original Excel file: need to import load_workbook in the openpyxl library

    • from openpyxl import load_workbook
      
      

2.2 track_type type

  • Generally, there are 2~7 track_type values in multimedia files, and the output of video/music/picture is roughly the same:

    • Common ones are usually General, Video, and Audio. The special Track_type also includes Other, Text, Menu, and Image.

    • Each track_type has multiple key attributes

  • Therefore, naming the title of each column according to track_type + :: + key (track_type secondary attribute) can intuitively reflect what attribute type the attribute value is, for example Chestnut

    • Complete name under the General attribute: Then the title name is General::Complete name

    • def getTitle(key_type, key):
          return key_type + '::' + key
      
      
    • In the code, track_type and key are obtained from the original Excel table and can be stored using json nested list.

    • DATA_KEYS = {
              'General' : [],
              'Text' : [],
              'Video' : [],
              'Audio' : [],
              'Menu' : [],
              'Image' : [],
              'Other' : [],
      }
      
      
  • 2.3 Key value acquisition method

    • We observe that each key attribute value is behind “:”

    • Overall bit rate mode : Constant
      
      
    • You can obtain the “:” position through the string str.find(“:”) method, and confirm the key and value values through sharding.

    • split_index = value.find(":")
      key,value = (i.strip() for i in [value[:split_index], value[split_index + len(DATA_SPLIT):]])
      
      
    • You can also get [key, value] through str.split(“:”)

    • split_list = value.split(":")
      key,value = (i.strip() for i in split_list)
      
      
  • Special media will have the same track_type, and there will be multiple sub-cases sub_key_type. For example, the Other type has two situations: #1 and #2

  • You can record the same multimedia file Genreal as a unique ID and call the merge_cells() method to merge.
target_sheet.merge_cells(start_row=row_index, start_column=1, end_row=end_row, end_column=1)

3. Excel table analysis

  • Parse line by line in origin_sheet to obtain datakeysMap

  • According to data_keys, write the title name in each column (1,i) of target_sheet

    • After writing the column number index of each track_type::key, it is stored in map-result.
    • When filling in each row of data, you can change the number of columns where the key value is located to fill in the corresponding value value.
  • Through the data in origin_sheet, fill in the corresponding values one by one into the target_sheet.

    • Through the nested json feature, the key in each type_datas is queried and written to the target_sheet

4. Realize the effect

5. Summary

This issue uses the openpyxl library and hash table to process the data in the Excel table for transposition operations. The key and value are collected in the original table through the hash table data_keys, and then the target_sheet is written to the first row and N columns according to data_keys(). as header.

After the preliminary work is done, start filling in the relevant values in each row of the target_sheet with the original data. During the addition process, start_row and max_row of the same ID need to be recorded, which is used to merge rows with the same General.

The above is the content of this issue. You are welcome to like and comment. See you in the next issue~~~


———————————END——————- ——–

Digression

Interested friends will receive a complete set of Python learning materials, including interview questions, resume information, etc. See below for details.

CSDN gift package:The most complete “Python learning materials” on the Internet are given away for free! (Safe link, click with confidence)

1. Python learning routes in all directions

The technical points in all directions of Python have been compiled to form a summary of knowledge points in various fields. Its usefulness is that you can find corresponding learning resources according to the following knowledge points to ensure that you learn more comprehensively.

img
img

2. Python essential development tools

The tools have been organized for you, and you can get started directly after installation! img

3. Latest Python study notes

When I learn a certain basic and have my own understanding ability, I will read some books or handwritten notes compiled by my seniors. These notes record their understanding of some technical points in detail. These understandings are relatively unique and can be learned. to a different way of thinking.

img

4. Python video collection

Watch a comprehensive zero-based learning video. Watching videos is the fastest and most effective way to learn. It is easy to get started by following the teacher’s ideas in the video, from basic to in-depth.

img

5. Practical cases

What you learn on paper is ultimately shallow. You must learn to type along with the video and practice it in order to apply what you have learned into practice. At this time, you can learn from some practical cases.

img

6. Interview Guide

CSDN gift package:The most complete “Python learning materials” on the Internet are given away for free! (Safe link, click with confidence)

If there is any infringement, please contact us for deletion.