gpt4 book ai didi

python - 将一个数据集中的值替换为另一个数据集中的值的有效方法

转载 作者:行者123 更新时间:2023-12-01 09:32:36 26 4
gpt4 key购买 nike

我有这个代码:

for index, row in df.iterrows():
for index1, row1 in df1.iterrows():
if df['budget'].iloc[index] == 0:
if df['production_companies'].iloc[index] == df1['production_companies'].iloc[index1]
and df['release_date'].iloc[index].year == df1['release_year'].iloc[index1] :
df['budget'].iloc[index] = df1['mean'].iloc[index1]

它可以工作,但需要很长时间才能完成。我怎样才能让它运行得更快?我也尝试过:

df.where((df['budget'] != 0 and df['production_companies'] != df1['production_companies']
and df['release_date'] != df1['release_year']),
other = pd.replace(to_replace = df['budget'],
value = df1['mean'], inplace = True))

它应该更快,但它不起作用。我该如何实现这一目标?谢谢!

df 看起来像这样:

budget; production_companies;   release_date    ;title    
0; Villealfa Filmproduction Oy ;10/21/1988; Ariel
0; Villealfa Filmproduction Oy ;10/16/1986; Shadows in Paradise
4000000; Miramax Films; 12/25/1995; Four Rooms
0; Universal Pictures; 10/15/1993; Judgment Night
42000; inLoops ;1/1/2006; Life in Loops (A Megacities RMX)
...

df1:

production_companies;   release_year;   mean;
Metro-Goldwyn-Mayer (MGM); 1998; 17500000
Metro-Goldwyn-Mayer (MGM); 1999; 12500000
Metro-Goldwyn-Mayer (MGM); 2000; 12000000
Metro-Goldwyn-Mayer (MGM) ;2001 ;43500000
Metro-Goldwyn-Mayer (MGM); 2002 ;12000000
Metro-Goldwyn-Mayer (MGM) ;2003; 36000000
Metro-Goldwyn-Mayer (MGM); 2004 ;27500000
...

如果年份和生产公司相同,我想将 df 中的值 0 替换为 df1 中的“平均值”值。

最佳答案

不要使用循环来执行此任务

pandas 的主要优点是矢量化功能。

矢量化计算的一种方法是对齐索引,然后使用 pd.DataFrame.index.map。要提取年份,您需要先转换为日期时间

数据来自@ALollz。

# convert release_date to datetime and calculate year
df['release_date'] = pd.to_datetime(df['release_date'])
df['year'] = df['release_date'].dt.year

# create mapping from df1
s = df1.set_index(['production_companies', 'release_year'])['mean']

# use map on selected condition
mask = df['budget'] == 0
df.loc[mask, 'budget'] = df[mask].set_index(['production_company', 'year']).index.map(s.get)

print(df)

# budget production_company release_date title year
# 0 1000000 Villealfa Filmproduction Oy 1988-10-21 AAA 1988
# 1 100 Villealfa Filmproduction Oy 1986-10-18 BBB 1986
# 2 30000000 Villealfa Filmproduction Oy 1955-12-25 CCC 1955
# 3 1000 Miramax Films 2006-01-01 DDD 2006
# 4 5000000 Miramax Films 2017-04-13 EEE 2017

关于python - 将一个数据集中的值替换为另一个数据集中的值的有效方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49833144/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com