Table of Contents
1. group by + single field + single aggregation
2. group by + single field + multiple aggregation
3. group by + multiple fields + single aggregation
4. group by + multiple fields + multiple aggregations
5. transform function
Create data:
import numpy as np import pandas as pd students=["Puppy","Kitten","Duck","Little Rabbit","Little Flower","Little Grass","Puppy","Kitten","Little Duck","Little Rabbit" ","小花","小草"] grade=["Primary School","Junior School"] sex=["male","female"] df=pd.DataFrame({ "students":students, "grade":np.random.choice(grade,12), "sex":np.random.choice(sex,12), "score":np.random.randint(80,100,12), "money":np.random.randint(800,1000,12) }) df
students | grade | sex | score | money | |
---|---|---|---|---|---|
0 | Puppy | Primary School | female | 95 | 844 |
1 | Kitten | Primary School | male | 93 | 836 |
2 | Little Duck | Junior Middle School | male | 83 | 854 |
3 | 小Rabbit | Primary School | female | 90 | 931 |
4 | 小花 | Primary School | male | 81 | 853 |
5 | 小草 | Primary School | male | 80 | 991 |
6 | Puppy | Junior Middle School | female | 81 | 854 |
7 | Kitten | Primary School | male | 93 | 886 |
8 | Little duck | Primary School | male | 88 | 983 |
9 | Little Rabbit | Primary School | male | 86 | 891 |
10
th> | 小花 | Junior Middle School | male | 92 | 830 |
11 | 小草 | Junior Middle School | male | 84 | 948 |
1. group by + single field + single aggregation
1.1 Method 1
# Find the total amount of each person: total_money=df.groupby("students")["money"].sum().reset_index() total_money
1.2 Method 2 (using agg)
df.groupby("students").agg({"money":"sum"}).reset_index() #or df.groupby("students").agg({"money":np.sum}).reset_index()
students | money | |
---|---|---|
0 | Little Rabbit | 1820 |
1 | Puppy | 1711 |
2 | Kitten | 1670 |
3 | 小花 | 1861 |
4 | 小草 | 1825 |
5 | Little Duck | 1719 |
2. group by + single field + multiple aggregation
2.1 Method 1 (use group by + merge)
mean_money = df.groupby("students")["money"].mean().reset_index() mean_money.columns = ["students","mean_money"] mean_money total_mean = total_money.merge(mean_money) total_mean
total_mean = total_money.merge(mean_money) total_mean
students | total_money | mean_money | ||||||
---|---|---|---|---|---|---|---|---|
0 | 小Rabbit | 1820 | 910.0 | |||||
1 | Puppy | 1711 | 855.5 | |||||
2 | Kitten | 1670 | 835.0 | |||||
3 | Little Flower | 1861 | 930.5 | |||||
4 | 小草 | 1825 | 912.5 | |||||
5 | Little Duck | 1719 | 859.5 |
students | total_money | mean_money | ||||||
---|---|---|---|---|---|---|---|---|
0 | 小Rabbit | 1820 | 910.0 | |||||
1 | Puppy | 1711 | 855.5 | |||||
2 | Kitten | 1670 | 835.0 | |||||
3 | Little Flower | 1861 | 930.5 | |||||
4 | 小草 | 1825 | 912.5 | |||||
5 | Little Duck | 1719 | 859.5 |
students | grade | money | |
---|---|---|---|
0 | Little Rabbit | Junior Middle School | 1820 |
1 | Puppy | Junior Middle School | 843 |
2 | Puppies | Primary School | 868 |
3 | Kittens | Primary School | 1670 |
4 | Little Flower | Junior School | 910 |
5 | 小花 | Primary School | 951 |
6 | 小草 | Junior Middle School | 1825 |
7 | Little Duck | Junior Middle School | 1719 |
3.2 Method 2 (using agg)
df.groupby(["students","grade"]).agg({"money":"sum"}).reset_index()
students | grade | money | |
---|---|---|---|
0 | Little Rabbit | Junior Middle School | 1820 |
1 | Puppy | Junior Middle School | 843 |
2 | Puppies | Primary School | 868 |
3 | Kittens | Primary School | 1670 |
4 | Little Flower | Junior Middle School | 910 |
5 | 小花 | Primary School | 951 |
6 | 小草 | Junior Middle School | 1825 |
7 | Little Duck | Junior Middle School | 1719 |
4. group by + multiple fields + multiple aggregations
The method of using the agg function is: agg(new column name=(“original column name”, “statistical function”))
df.groupby(["students","grade"]).agg(total_money=("money", "sum"),mean_money=("money", "mean"),total_score=("score" , "sum")).reset_index()
students | grade | total_money | mean_money | total_score | |
---|---|---|---|---|---|
0 | 小Rabbit | Junior Middle School | 1820 | 910.0 | 192 |
1 | Puppy | Junior Middle School | 843 | 843.0 | 88 |
2 | Puppy | Primary School | 868 | 868.0 | 93 |
3 | Kitten | Primary School | 1670 | 835.0 | 178 |
4 | 小花 | Junior Middle School | 910 | 910.0 | 95 |
5 | 小花 | Primary School | 951 | 951.0 | 98 |
6 | 小草 | Junior Middle School | 1825 | 912.5 | 184 |
7 | Little Duck | Junior Middle School | 1719 | 859.5 | 173 |
5. transform function
5.1 Method 1 (using groupby + merge)
df_1 = df.groupby("grade")["score"].mean().reset_index() df_1.columns = ["grade", "average_score"] df_1
grade | average_score | |
---|---|---|
0 | Junior School | 85.00 |
1 | Primary School | 88.25 |
df_new1 = pd.merge(df, df_1, on="grade") df_new1
students | grade | sex | score | money | average_score | |
---|---|---|---|---|---|---|
0 | Puppy | Primary School | female | 95 | 844 | 88.25 |
1 | Kitten | Primary School | male | 93 | 836 | 88.25 |
2 | 小Rabbit | Primary School | female | 90 | 931 | 88.25 |
3 | 小花 | Primary School | male | 81 | 853 | 88.25 |
4 | 小草 | Primary School | male | 80 | 991 | 88.25 |
5 | Kitten | Primary School | male | 93 | 886 | 88.25 |
6 | Little Duck | Primary School | male | 88 | 983 | 88.25 |
7 | Little Rabbit | Primary School | male | 86 | 891 | 88.25 |
8 | Little Duck | Junior Middle School | male | 83 | 854 | 85.00 |
9 | Puppy | Junior High School | female | 81 | 854 | 85.00 |
10 | 小花 | Junior Middle School | male | 92 | 830 | 85.00 |
11 | 小草 | Junior Middle School | male | 84 | 948 | 85.00 |
5.2 Method 2 (using groupby + map)
dic = df.groupby("grade")["score"].mean().to_dict() dic
{'Junior School': 85.0, 'Primary School': 88.25}
df_new1["average_map_score"] = df["grade"].map(dic) df_new1
students | grade | sex | score | money | average_score | average_map_score | |
---|---|---|---|---|---|---|---|
0 | Puppy | Primary School | female | 95 | 844 | 88.25 | 88.25 |
1 | Kitten | Primary School | male | 93 | 836 | 88.25 | 88.25 |
2 | Little Rabbit | Primary School | female | 90 | 931 | 88.25 | 85.00 |
3 | 小花 | Primary School | male | 81 | 853 | 88.25 | 88.25 |
4 | 小草 | Primary School | male | 80 | 991 | 88.25 | 88.25 |
5 | Kitten | Primary School | male | 93 | 886 | 88.25 | 88.25 |
6 | Little Duck | Primary School | male | 88 | 983 | 88.25 | 85.00 |
7 | 小Rabbit | Primary School | male | 86 | 891 | 88.25 | 88.25 |
8 | Little Duck | Junior Middle School | male | 83 | 854 | 85.00 | 88.25 |
9 | Puppy | Junior Middle School | female | 81 | 854 | 85.00 | 88.25 |
10 | 小花 | Junior Middle School | male | 92 | 830 | 85.00 | 85.00 |
11 | 小草 | Junior Middle School | male | 84 | 948 | 85.00 | 85.00 |
5.3 Method 3 (use transform in one step)
df_new1["average_trans_score"] = df.groupby("grade")["score"].transform("mean") df_new1
students | grade | sex | score | money | average_score | average_map_score | average_trans_score | |
---|---|---|---|---|---|---|---|---|
0 | Puppy | Primary School | female | 95 | 844 | 88.25 | 88.25 | 88.25 |
1 | Kitten | Primary School | male | 93 | 836 | 88.25 | 88.25 | 88.25 |
2 | Little Rabbit | Primary School | female | 90 | 931 | 88.25 | 85.00 | 85.00 |
3 | 小花 | Primary School | male | 81 | 853 | 88.25 | 88.25 | 88.25 |
4 | 小草 | Primary School | male | 80 | 991 | 88.25 | 88.25 | 88.25 |
5 | Kitten | Primary School | male | 93 | 886 | 88.25 | 88.25 | 88.25 |
6 | Little Duck | Primary School | male | 88 | 983 | 88.25 | 85.00 | 85.00 |
7 | 小Rabbit | Primary School | male | 86 | 891 | 88.25 | 88.25 | 88.25 |
8 | Little Duck | Junior Middle School | male | 83 | 854 | 85.00 | 88.25 | 88.25 |
9 | Puppy | Junior Middle School | female | 81 | 854 | 85.00 | 88.25 | 88.25 |
10 | 小花 | Junior Middle School | male | 92 | 830 | 85.00 | 85.00 | 85.00 |
11 | 小草 | Junior Middle School | male | 84 | 948 | 85.00 | 85.00 | 85.00 |
The knowledge points of the article match the official knowledge archives, and you can further learn relevant knowledge. Python introductory skill treeStructured data analysis tool PandasPandas overview 384574 people are learning the system