gpt4 book ai didi

python - 基于另一列和单独的字典更新 pandas DataFrame 中的列

转载 作者:行者123 更新时间:2023-11-30 22:47:06 26 4
gpt4 key购买 nike

我正在尝试让我的一段代码运行得更快。

我有两个不同大小的数据框,A 和 B。我也有一个年龄字典,称为age_dict。

A 包含 100 行,B 包含 200 行。它们都使用从 0 开始的索引。它们都有两列,即“Name”和“Age”

字典键是姓名,值是年龄。所有键都是唯一的,没有重复

{'约翰':20,'麦克斯':25,' jack ':30}

我想找到每个 DataFrame 中的名字,并从字典中为它们分配年龄。我使用以下代码实现这一点(我想返回一个新的 DataFrame 而不是修改旧的):

def age(df):
new_df = df.copy(deep=True)
i = 0
while i < len(new_df['Name']):
name = new_df['Name'][i]
age = age_dict[name]
new_df['Age'][i] = age
i += 1
return new_df

new_A = age(A)
new_B = age(B)

这段代码花费的时间比我想要的要长,所以我想知道 pandas 是否有更简单的方法来做到这一点,而不是我循环遍历每一行?

谢谢!

最佳答案

我认为你需要map :

A = pd.DataFrame({'Name':['John','Max','Joe']})
print (A)
Name
0 John
1 Max
2 Joe

d = {'John':20,'Max':25,'Jack':30}

A1 = A.copy(deep=True)
A1['Age'] = A.Name.map(d)
print (A1)
Name Age
0 John 20.0
1 Max 25.0
2 Joe NaN

如果需要功能:

d = {'John':20,'Max':25,'Jack':30}

def age(df):
new_df = df.copy(deep=True)
new_df['Age'] = new_df.Name.map(d)
return new_df

new_A = age(A)
print (new_A)
Name Age
0 John 20.0
1 Max 25.0
2 Joe NaN

时间:

In [191]: %timeit (age(A))
10 loops, best of 3: 21.8 ms per loop

In [192]: %timeit (jul(A))
10 loops, best of 3: 47.6 ms per loop

计时代码:

A = pd.DataFrame({'Name':['John','Max','Joe']})
#[300000 rows x 2 columns]
A = pd.concat([A]*100000).reset_index(drop=True)
print (A)

d = {'John':20,'Max':25,'Jack':30}

def age(df):
new_df = df.copy(deep=True)
new_df['Age'] = new_df.Name.map(d)
return new_df

def jul(A):
df = pd.DataFrame({'Name': list(d.keys()), 'Age': list(d.values())})
A1 = pd.merge(A, df, how='left')
return A1
<小时/>
A = pd.DataFrame({'Name':['John','Max','Joe']})
#[300 rows x 2 columns]
A = pd.concat([A]*100).reset_index(drop=True)


In [194]: %timeit (age(A))
The slowest run took 5.22 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 742 µs per loop

In [195]: %timeit (jul(A))
The slowest run took 4.51 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.87 ms per loop

关于python - 基于另一列和单独的字典更新 pandas DataFrame 中的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40628264/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com