[pandas skills] group by+agg+transform function

Table of Contents

1. group by + single field + single aggregation

2. group by + single field + multiple aggregation

3. group by + multiple fields + single aggregation

4. group by + multiple fields + multiple aggregations

5. transform function

Create data:

import numpy as np
import pandas as pd
students=["Puppy","Kitten","Duck","Little Rabbit","Little Flower","Little Grass","Puppy","Kitten","Little Duck","Little Rabbit" ","小花","小草"]
grade=["Primary School","Junior School"]
sex=["male","female"]
df=pd.DataFrame({
    "students":students,
    "grade":np.random.choice(grade,12),
    "sex":np.random.choice(sex,12),
    "score":np.random.randint(80,100,12),
    "money":np.random.randint(800,1000,12)
})
df

	students	grade	sex	score	money
0	Puppy	Primary School	female	95	844
1	Kitten	Primary School	male	93	836
2	Little Duck	Junior Middle School	male	83	854
3	小Rabbit	Primary School	female	90	931
4	小花	Primary School	male	81	853
5	小草	Primary School	male	80	991
6	Puppy	Junior Middle School	female	81	854
7	Kitten	Primary School	male	93	886
8	Little duck	Primary School	male	88	983
9	Little Rabbit	Primary School	male	86	891
10 th>	小花	Junior Middle School	male	92	830
11	小草	Junior Middle School	male	84	948

1. group by + single field + single aggregation

1.1 Method 1

# Find the total amount of each person:
total_money=df.groupby("students")["money"].sum().reset_index()
total_money

1.2 Method 2 (using agg)

df.groupby("students").agg({"money":"sum"}).reset_index()
#or
df.groupby("students").agg({"money":np.sum}).reset_index()

	students	money
0	Little Rabbit	1820
1	Puppy	1711
2	Kitten	1670
3	小花	1861
4	小草	1825
5	Little Duck	1719

2. group by + single field + multiple aggregation

2.1 Method 1 (use group by + merge)

mean_money = df.groupby("students")["money"].mean().reset_index()
mean_money.columns = ["students","mean_money"]
mean_money
total_mean = total_money.merge(mean_money)
total_mean

total_mean = total_money.merge(mean_money)
total_mean

< /table>

2.2 Method 2 (using group by + agg)

total_mean = df.groupby("students").agg(total_money=("money", "sum"),mean_money=("money", "mean")).reset_index()
total_mean

	students	total_money	mean_money
0	小Rabbit	1820	910.0
1	Puppy	1711	855.5
2	Kitten	1670	835.0
3	Little Flower	1861	930.5
4	小草	1825	912.5
5	Little Duck	1719	859.5

< /table>

3. group by + multiple fields + single aggregation

3.1 Method 1

df.groupby(["students","grade"])["money"].sum().reset_index()

	students	total_money	mean_money
0	小Rabbit	1820	910.0
1	Puppy	1711	855.5
2	Kitten	1670	835.0
3	Little Flower	1861	930.5
4	小草	1825	912.5
5	Little Duck	1719	859.5

	students	grade	money
0	Little Rabbit	Junior Middle School	1820
1	Puppy	Junior Middle School	843
2	Puppies	Primary School	868
3	Kittens	Primary School	1670
4	Little Flower	Junior School	910
5	小花	Primary School	951
6	小草	Junior Middle School	1825
7	Little Duck	Junior Middle School	1719

3.2 Method 2 (using agg)

df.groupby(["students","grade"]).agg({"money":"sum"}).reset_index()

	students	grade	money
0	Little Rabbit	Junior Middle School	1820
1	Puppy	Junior Middle School	843
2	Puppies	Primary School	868
3	Kittens	Primary School	1670
4	Little Flower	Junior Middle School	910
5	小花	Primary School	951
6	小草	Junior Middle School	1825
7	Little Duck	Junior Middle School	1719

4. group by + multiple fields + multiple aggregations

The method of using the agg function is: agg(new column name=(“original column name”, “statistical function”))

df.groupby(["students","grade"]).agg(total_money=("money", "sum"),mean_money=("money", "mean"),total_score=("score" , "sum")).reset_index()

	students	grade	total_money	mean_money	total_score
0	小Rabbit	Junior Middle School	1820	910.0	192
1	Puppy	Junior Middle School	843	843.0	88
2	Puppy	Primary School	868	868.0	93
3	Kitten	Primary School	1670	835.0	178
4	小花	Junior Middle School	910	910.0	95
5	小花	Primary School	951	951.0	98
6	小草	Junior Middle School	1825	912.5	184
7	Little Duck	Junior Middle School	1719	859.5	173

5. transform function

5.1 Method 1 (using groupby + merge)

df_1 = df.groupby("grade")["score"].mean().reset_index()
df_1.columns = ["grade", "average_score"]
df_1

	grade	average_score
0	Junior School	85.00
1	Primary School	88.25

df_new1 = pd.merge(df, df_1, on="grade")
df_new1

	students	grade	sex	score	money	average_score
0	Puppy	Primary School	female	95	844	88.25
1	Kitten	Primary School	male	93	836	88.25
2	小Rabbit	Primary School	female	90	931	88.25
3	小花	Primary School	male	81	853	88.25
4	小草	Primary School	male	80	991	88.25
5	Kitten	Primary School	male	93	886	88.25
6	Little Duck	Primary School	male	88	983	88.25
7	Little Rabbit	Primary School	male	86	891	88.25
8	Little Duck	Junior Middle School	male	83	854	85.00
9	Puppy	Junior High School	female	81	854	85.00
10	小花	Junior Middle School	male	92	830	85.00
11	小草	Junior Middle School	male	84	948	85.00

5.2 Method 2 (using groupby + map)

dic = df.groupby("grade")["score"].mean().to_dict()
dic

{'Junior School': 85.0, 'Primary School': 88.25}

df_new1["average_map_score"] = df["grade"].map(dic)
df_new1

	students	grade	sex	score	money	average_score	average_map_score
0	Puppy	Primary School	female	95	844	88.25	88.25
1	Kitten	Primary School	male	93	836	88.25	88.25
2	Little Rabbit	Primary School	female	90	931	88.25	85.00
3	小花	Primary School	male	81	853	88.25	88.25
4	小草	Primary School	male	80	991	88.25	88.25
5	Kitten	Primary School	male	93	886	88.25	88.25
6	Little Duck	Primary School	male	88	983	88.25	85.00
7	小Rabbit	Primary School	male	86	891	88.25	88.25
8	Little Duck	Junior Middle School	male	83	854	85.00	88.25
9	Puppy	Junior Middle School	female	81	854	85.00	88.25
10	小花	Junior Middle School	male	92	830	85.00	85.00
11	小草	Junior Middle School	male	84	948	85.00	85.00

5.3 Method 3 (use transform in one step)

df_new1["average_trans_score"] = df.groupby("grade")["score"].transform("mean")
df_new1

	students	grade	sex	score	money	average_score	average_map_score	average_trans_score
0	Puppy	Primary School	female	95	844	88.25	88.25	88.25
1	Kitten	Primary School	male	93	836	88.25	88.25	88.25
2	Little Rabbit	Primary School	female	90	931	88.25	85.00	85.00
3	小花	Primary School	male	81	853	88.25	88.25	88.25
4	小草	Primary School	male	80	991	88.25	88.25	88.25
5	Kitten	Primary School	male	93	886	88.25	88.25	88.25
6	Little Duck	Primary School	male	88	983	88.25	85.00	85.00
7	小Rabbit	Primary School	male	86	891	88.25	88.25	88.25
8	Little Duck	Junior Middle School	male	83	854	85.00	88.25	88.25
9	Puppy	Junior Middle School	female	81	854	85.00	88.25	88.25
10	小花	Junior Middle School	male	92	830	85.00	85.00	85.00
11	小草	Junior Middle School	male	84	948	85.00	85.00	85.00

The knowledge points of the article match the official knowledge archives, and you can further learn relevant knowledge. Python introductory skill treeStructured data analysis tool PandasPandas overview 384574 people are learning the system