gpt4 book ai didi

python - 如何修改 pd.dataframe 中的列值

转载 作者:行者123 更新时间:2023-11-28 17:04:45 24 4
gpt4 key购买 nike

背景:其实我想修改dataframe中的值,只保留前20名的运动,其他的应该像“其他”一样显示。它是现有列的副本,如下所示:

athlete_events['Sport_modified'] = athlete_events['Sport']

并且包含 top20 运动名称的过滤器生成如下:

top20_sport = athlete_events['Sport'].value_counts().head(20).index

修改过程如下:方法一:

 def classify_sports(cols, filters):
for i in cols:
if i in filters:
pass
else:
i = 'Others'
classify_sports(athlete_events.Sport_modified, top20_sport)

方法二:

athlete_events.Sport_modified.apply(lambda x : x if x in top20_sport else 'Others')

但是,上面的2个方法并没有奏效。我可以像这段代码那样做的唯一方法:

athlete_events.loc[
(athlete_events['Sport'] !='Athletics')&
(athlete_events['Sport'] !='Gymnastics')&
(athlete_events['Sport'] !='Swimming')&
(athlete_events['Sport'] !='Shooting')&
(athlete_events['Sport'] !='Cycling')&
(athlete_events['Sport'] !='Fencing')&
(athlete_events['Sport'] !='Rowing')&
(athlete_events['Sport'] !='Cross Country Skiing')&
(athlete_events['Sport'] !='Alpine Skiing')&
(athlete_events['Sport'] !='Wrestling')&
(athlete_events['Sport'] !='Football')&
(athlete_events['Sport'] !='Sailing')&
(athlete_events['Sport'] !='Equestrianism')&
(athlete_events['Sport'] !='Canoeing')&
(athlete_events['Sport'] !='Boxing')&
(athlete_events['Sport'] !='Speed Skating')&
(athlete_events['Sport'] !='Ice Hockey')&
(athlete_events['Sport'] !='Hockey')&
(athlete_events['Sport'] !='Biathlon')&
(athlete_events['Sport'] !='Basketball')
,'Sport_modified'] = 'Others'

以上两种方式有什么问题?感谢帮助。

最佳答案

您的第一个方法永远行不通,因为您的函数不会返回一个序列,也不会返回任何行计算。

您的第二种方法不是就地,您需要分配回一个系列。例如:

df['sport_modified'] = df['sport'].apply(lambda x : x if x in top20_sport else 'Others')

您的最终解决方案可以使用 pd.Series.isin 更有效地表达,通过 ~ 取反:

L = ['Athletics', 'Gymnastics', ...]

df.loc[~df['sport'].isin(L), 'sport_modified'] = 'Others'

关于python - 如何修改 pd.dataframe 中的列值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51760639/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com