Handle data analysis with Python, you need to know these spreadsheets


I used to be an avid Excel user when I was in college. Back then I didn’t know how to code, so at least I needed to know Excel to get a job.

Excel helps me a lot whenever I want to clean up data, create pivot tables, and make charts. That is, whenever the data has millions of rows or complex calculations, either Excel becomes very slow or the workbook crashes.

I can’t count how many times these roadblocks have stopped my workflow, or how many times I’ve had to repeat a task over and over again.

Luckily, I learned to code in Python and I don’t have these problems anymore.

This article will introduce you to some of the best spreadsheets in Python, and why you should choose it instead of excel

1. Spreadsheet: Mito

Mito is a spreadsheet in Python. Simply put, if you can edit an Excel file, you can now write code.

This means we can have the simplicity of spreadsheets like Excel with the power of Python. We can use Mito like Excel. After we finish each action, Mito will automatically generate the Python code corresponding to each action.

To use Mito, first, we need to install Python and Jupyter Notebook or Jupyter Lab. We can then install Mito by running the following command on a terminal or command prompt.

python -m pip install mitoinstaller
python -m mitoinstaller install

That’s all we need to get started with Mito!

If you have any problems during the installation, please check the official documentation. https://docs.trymito.io/getting-started/installing-mito

Now let’s do the tasks you would do in Excel, like reading files, creating columns, pivot tables, visualizations, etc.

2. Why use it

1. Read the file – Excel can handle about 1 million rows, Python can handle more

Excel has a limit on the number of rows you can use. If you try to open a file with millions of rows, the file will be opened, but you won’t see more than 1,048,576 rows in Excel. In contrast, Python can handle millions of rows. The only limitation will be your PC’s computing power.

Let’s see how we can read a file with Mito.

Before reading a CSV file, first, we need to create a Mito spreadsheet. To do this, we run the code below.

import mitosheet
mitosheet.sheet()

Now it’s time to read the CSV file. For this demonstration, I will be using a dataset containing school grades, which you can download here. Once you have downloaded the file, click the ” + ” or “Import” button to load it, as shown below.

In addition to importing, the code below is also automatically generated.

import pandas as pd
StudentsPerformance_csv = pd. read_csv(r'StudentsPerformance.csv').

2. With Excel, you can achieve basic automation. With Mito and Python, the sky is the limit

In Excel, you can create a macro to record a series of actions and execute them whenever you want.

With Mito, we can do the same thing, and when coupled with the hundreds of free libraries for Python, you can do much more, such as emailing reports, sending files using WhatsApp, using Google Forms as a basic database, etc. wait.

Let’s record some operations with Mito, just like we do with Excel.

a. Create a new column/rename column

If you want to create a new column, just click the “Add Col” button. By default, the “Add Column” button will create a column named “new-column”, so we will rename this column name to “average” by double-clicking on it.


Mito will generate codes corresponding to the operations we do on the spreadsheet.

b. Sum a row

If you want to sum a certain row, we just need to write the formula (math score + reading score + writing score)/3 in one cell. Let’s say we want to calculate average scores for math, reading, and writing. We just need to write the formula (Math Score + Reading Score + Writing Score)/3 in any cell within the “Average” column.


Here is the code generated by Mito.

c. Make a pivot table

Creating a pivot table is as easy as clicking the “Pivot” button. Then we want to select rows, columns and values. Let’s create a pivot table that displays the average math and reading scores for groups A, B, C, D, and E in the “Race/Ethnicity” column.


In order to create this pivot table in Python, we have to write the following code. With Mito, the code above is automatically generated.

d. Create a histogram

Basic visualizations, such as pie charts and bar charts, can be easily created with Mito. We just need to click on “Charts” and select the chart type. Let’s create a bar chart for the pivot table we created earlier, showing “Race/Ethnicity” on the X-axis and “Math Score Mean” on the Y-axis.


Great! The few lines of code generated in a, b, c, and d are equivalent to an Excel macro. Every time we run the code, we execute all recorded actions.

03. Python can handle complex calculations. These calculations can crash the Excel workbook

Those memory-intensive calculations that would crash an Excel workbook, work just fine in Python. In this case, we’re going to use another Python library called bamboolib to perform a sequence of actions.

3. Spreadsheet: Bamboolib

To install bamboolib, run the command pip install –upgrade bamboolib –user on the terminal. After that, we want to run the command below to read the CSV or Excel file.

import bamboolib as bam
bam

In this case we are going to do some calculations in a CSV file with over 1 million rows, which you can download here. (https://drive.google.com/file/d/1YiOVav6-g_K8icZMzPszy8K20ozkAMu0/view?usp=sharing)

Like Mito, Bamboolib generates the code for us (from now on I will only show the steps, hiding the generated code to keep this article short).

import pandas as pd
df = pd.read_csv(r’/Users/frank/Downloads/sales-data.csv’, sep=’,’, decimal=’.’, nrows=100000)
df

Now let’s create a new column. To do this, we search for the action on the search bar, give the column a name, and enter the formula for the column.

For this demonstration, we will create a “price” column by using the formula (revenue/quantity).

Now let’s type “pivot table” in the search bar. We then group the data by product in the row and use “sum” as the aggregation function.

Next, we create a pie chart. We have to click the “Create Chart” button, choose the chart type, and choose the values we want to display.

Finally, we extract attributes from the “Date” column. In this case, we extract the name of the month, but, first, we must change the data type of the “Date” column to date (which is now set to str).

If you followed each step, you can verify that all calculations went smoothly!

Last

If you are interested in Python technology and want to learn data analysis, crawlers, etc.; here is a Python full set of learning materials, including learning route, software, source code, video, interview The questions and so on are all sorted out by myself when I was studying. It is not easy to sort out. Please like and share more~

1. Learning routes in all directions of Python

The route of all directions in Python is to organize the commonly used technical points of Python to form a summary of knowledge points in various fields. Its usefulness lies in that you can find corresponding learning resources according to the above knowledge points to ensure that you learn more comprehensively.

2. Python learning software

If a worker wants to do a good job, he must first sharpen his tools. The commonly used development software for learning Python is here!

3. Python introductory learning video

There are also many learning videos suitable for getting started with 0 basics. With these videos, you can easily get started with Python~ “>

4. Python exercises

After each video lesson, there are corresponding practice questions, you can test the learning results haha!

5. Python actual combat case

Optical theory is useless. You have to learn to type codes along with it, and then you can apply what you have learned in practice. At this time, you can learn from some practical cases. This information is also included~

6. Python interview materials

After we have learned Python, we can go out and find a job with the skills! The following interview questions are all from first-line Internet companies such as Alibaba, Tencent, and Byte, and some Alibaba bosses have given authoritative answers. After reading this set of interview materials, I believe everyone can find a satisfactory job.

7. Data collection

The above-mentioned full version of the full set of learning materials for Python has been uploaded to the CSDN official website. Those who need it can scan the QR code of the CSDN official certification below on WeChat and enter the “receive materials” to get it for free! !