gpt4 book ai didi

python - 生成 Pandas 中排名最高的值的列

转载 作者:行者123 更新时间:2023-12-01 04:22:16 26 4
gpt4 key购买 nike

我有一个数据框topic_data,其中包含LDA主题模型的输出:

topic_data.head(15)

topic word score
0 0 Automobile 0.063986
1 0 Vehicle 0.017457
2 0 Horsepower 0.015675
3 0 Engine 0.014857
4 0 Bicycle 0.013919
5 1 Sport 0.032938
6 1 Association_football 0.025324
7 1 Basketball 0.020949
8 1 Baseball 0.016935
9 1 National_Football_League 0.016597
10 2 Japan 0.051454
11 2 Beer 0.032839
12 2 Alcohol 0.027909
13 2 Drink 0.019494
14 2 Vodka 0.017908

这显示了每个主题的前 5 个术语以及每个术语的分数(权重)。我想要做的是重新格式化,以便索引是术语的排名,列是主题 ID,值是从 wordscore< 生成的格式化字符串 列(类似于 "%s (%.02f)"% (word,score))。这意味着新的数据框应该如下所示:

Topic  0                1                            ...
Rank
0 Automobile (0.06) Sport (0.03) ...
1 Vehicle (0.017) Association_football (0.03) ...
... ... ... ...

解决这个问题的正确方法是什么?我认为它涉及索引设置、拆栈和排名的组合,但我不确定正确的方法。

最佳答案

它会是这样的,请注意,必须首先生成Rank:

In [140]:
df['Rank'] = (-1*df).groupby('topic').score.transform(np.argsort)
df['New_str'] = df.word + df.score.apply(' ({0:.2f})'.format)
df2 = df.sort(['Rank', 'score'])[['New_str', 'topic','Rank']]
print df2.pivot(index='Rank', values='New_str', columns='topic')

topic 0 1 2
Rank
0 Automobile (0.06) Sport (0.03) Japan (0.05)
1 Vehicle (0.02) Association_football (0.03) Beer (0.03)
2 Horsepower (0.02) Basketball (0.02) Alcohol (0.03)
3 Engine (0.01) Baseball (0.02) Drink (0.02)
4 Bicycle (0.01) National_Football_League (0.02) Vodka (0.02)

关于python - 生成 Pandas 中排名最高的值的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33576229/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com