[pandas skills] group by+agg+transform function

Table of Contents

1. group by + single field + single aggregation

2. group by + single field + multiple aggregation

3. group by + multiple fields + single aggregation

4. group by + multiple fields + multiple aggregations

5. transform function

Create data:

import numpy as np
import pandas as pd
students=["Puppy","Kitten","Duck","Little Rabbit","Little Flower","Little Grass","Puppy","Kitten","Little Duck","Little Rabbit" ","小花","小草"]
grade=["Primary School","Junior School"]
sex=["male","female"]
df=pd.DataFrame({
    "students":students,
    "grade":np.random.choice(grade,12),
    "sex":np.random.choice(sex,12),
    "score":np.random.randint(80,100,12),
    "money":np.random.randint(800,1000,12)
})
df
students grade sex score money
0 Puppy Primary School female 95 844
1 Kitten Primary School male 93 836
2 Little Duck Junior Middle School male 83 854
3 小Rabbit Primary School female 90 931
4 小花 Primary School male 81 853
5 小草 Primary School male 80 991
6 Puppy Junior Middle School female 81 854
7 Kitten Primary School male 93 886
8 Little duck Primary School male 88 983
9 Little Rabbit Primary School male 86 891
10

th>

小花 Junior Middle School male 92 830
11 小草 Junior Middle School male 84 948

1. group by + single field + single aggregation

1.1 Method 1

# Find the total amount of each person:
total_money=df.groupby("students")["money"].sum().reset_index()
total_money

1.2 Method 2 (using agg)

df.groupby("students").agg({"money":"sum"}).reset_index()
#or
df.groupby("students").agg({"money":np.sum}).reset_index()
students money
0 Little Rabbit 1820
1 Puppy 1711
2 Kitten 1670
3 小花 1861
4 小草 1825
5 Little Duck 1719

2. group by + single field + multiple aggregation

2.1 Method 1 (use group by + merge)

mean_money = df.groupby("students")["money"].mean().reset_index()
mean_money.columns = ["students","mean_money"]
mean_money
total_mean = total_money.merge(mean_money)
total_mean

total_mean = total_money.merge(mean_money)
total_mean

< /table>

2.2 Method 2 (using group by + agg)

total_mean = df.groupby("students").agg(total_money=("money", "sum"),mean_money=("money", "mean")).reset_index()
total_mean
students total_money mean_money
0 小Rabbit 1820 910.0
1 Puppy 1711 855.5
2 Kitten 1670 835.0
3 Little Flower 1861 930.5
4 小草 1825 912.5
5 Little Duck 1719 859.5

< /table>

3. group by + multiple fields + single aggregation

3.1 Method 1

df.groupby(["students","grade"])["money"].sum().reset_index()
students total_money mean_money
0 小Rabbit 1820 910.0
1 Puppy 1711 855.5
2 Kitten 1670 835.0
3 Little Flower 1861 930.5
4 小草 1825 912.5
5 Little Duck 1719 859.5
students grade money
0 Little Rabbit Junior Middle School 1820
1 Puppy Junior Middle School 843
2 Puppies Primary School 868
3 Kittens Primary School 1670
4 Little Flower Junior School 910
5 小花 Primary School 951
6 小草 Junior Middle School 1825
7 Little Duck Junior Middle School 1719

3.2 Method 2 (using agg)

df.groupby(["students","grade"]).agg({"money":"sum"}).reset_index()
students grade money
0 Little Rabbit Junior Middle School 1820
1 Puppy Junior Middle School 843
2 Puppies Primary School 868
3 Kittens Primary School 1670
4 Little Flower Junior Middle School 910
5 小花 Primary School 951
6 小草 Junior Middle School 1825
7 Little Duck Junior Middle School 1719

4. group by + multiple fields + multiple aggregations

The method of using the agg function is: agg(new column name=(“original column name”, “statistical function”))

df.groupby(["students","grade"]).agg(total_money=("money", "sum"),mean_money=("money", "mean"),total_score=("score" , "sum")).reset_index()
students grade total_money mean_money total_score
0 小Rabbit Junior Middle School 1820 910.0 192
1 Puppy Junior Middle School 843 843.0 88
2 Puppy Primary School 868 868.0 93
3 Kitten Primary School 1670 835.0 178
4 小花 Junior Middle School 910 910.0 95
5 小花 Primary School 951 951.0 98
6 小草 Junior Middle School 1825 912.5 184
7 Little Duck Junior Middle School 1719 859.5 173

5. transform function

5.1 Method 1 (using groupby + merge)

df_1 = df.groupby("grade")["score"].mean().reset_index()
df_1.columns = ["grade", "average_score"]
df_1
grade average_score
0 Junior School 85.00
1 Primary School 88.25
df_new1 = pd.merge(df, df_1, on="grade")
df_new1
students grade sex score money average_score
0 Puppy Primary School female 95 844 88.25
1 Kitten Primary School male 93 836 88.25
2 小Rabbit Primary School female 90 931 88.25
3 小花 Primary School male 81 853 88.25
4 小草 Primary School male 80 991 88.25
5 Kitten Primary School male 93 886 88.25
6 Little Duck Primary School male 88 983 88.25
7 Little Rabbit Primary School male 86 891 88.25
8 Little Duck Junior Middle School male 83 854 85.00
9 Puppy Junior High School female 81 854 85.00
10 小花 Junior Middle School male 92 830 85.00
11 小草 Junior Middle School male 84 948 85.00

5.2 Method 2 (using groupby + map)

dic = df.groupby("grade")["score"].mean().to_dict()
dic
{'Junior School': 85.0, 'Primary School': 88.25}
df_new1["average_map_score"] = df["grade"].map(dic)
df_new1
students grade sex score money average_score average_map_score
0 Puppy Primary School female 95 844 88.25 88.25
1 Kitten Primary School male 93 836 88.25 88.25
2 Little Rabbit Primary School female 90 931 88.25 85.00
3 小花 Primary School male 81 853 88.25 88.25
4 小草 Primary School male 80 991 88.25 88.25
5 Kitten Primary School male 93 886 88.25 88.25
6 Little Duck Primary School male 88 983 88.25 85.00
7 小Rabbit Primary School male 86 891 88.25 88.25
8 Little Duck Junior Middle School male 83 854 85.00 88.25
9 Puppy Junior Middle School female 81 854 85.00 88.25
10 小花 Junior Middle School male 92 830 85.00 85.00
11 小草 Junior Middle School male 84 948 85.00 85.00

5.3 Method 3 (use transform in one step)

df_new1["average_trans_score"] = df.groupby("grade")["score"].transform("mean")
df_new1
students grade sex score money average_score average_map_score average_trans_score
0 Puppy Primary School female 95 844 88.25 88.25 88.25
1 Kitten Primary School male 93 836 88.25 88.25 88.25
2 Little Rabbit Primary School female 90 931 88.25 85.00 85.00
3 小花 Primary School male 81 853 88.25 88.25 88.25
4 小草 Primary School male 80 991 88.25 88.25 88.25
5 Kitten Primary School male 93 886 88.25 88.25 88.25
6 Little Duck Primary School male 88 983 88.25 85.00 85.00
7 小Rabbit Primary School male 86 891 88.25 88.25 88.25
8 Little Duck Junior Middle School male 83 854 85.00 88.25 88.25
9 Puppy Junior Middle School female 81 854 85.00 88.25 88.25
10 小花 Junior Middle School male 92 830 85.00 85.00 85.00
11 小草 Junior Middle School male 84 948 85.00 85.00 85.00

The knowledge points of the article match the official knowledge archives, and you can further learn relevant knowledge. Python introductory skill treeStructured data analysis tool PandasPandas overview 384574 people are learning the system