gpt4 book ai didi

python - 获取每行列表中最常用的单词

转载 作者:行者123 更新时间:2023-12-04 07:59:54 25 4
gpt4 key购买 nike

我有一个数据框 df超过 50,000 行:

>>> df
message words wordCount uniqueWordCount
0 my name is [my, name, is] 3 3
1 happy birthday to you [happy, birthday, to, you] 4 4
2 la la la la la [la, la, la, la, la] 5 1
3 you are you that is it [you, are, you, that, is, it] 6 5
...
我想用 message 中最常用的 3 个词创建一个新列.
到目前为止我所拥有的工作,但需要相当长的时间。
>>> df["mostFrequent"] = df["message"].apply(
lambda x: sorted(
textblob.TextBlob(x).word_counts, key=textblob.TextBlob(x).word_counts.get, reverse=True)[:3])

>>> df["mostFrequent"]
0 [my, name, is]
1 [happy, birthday, to]
2 [la]
3 [you, are, that]
...
有没有更有效的方法来做到这一点?

最佳答案

将自定义 lambda 函数与 collections.Counter 一起使用:

from collections import Counter
f = lambda x: [word for word, word_count in Counter(x).most_common(3)]
df["mostFrequent"] = df["words"].apply(f)
print (df)
message words wordCount \
0 my name is [my, name, is] 3
1 happy birthday to you [happy, birthday, to, you] 4
2 la la la la la [la, la, la, la, la] 5
3 you are you that is it [you, are, you, that, is, it] 6

uniqueWordCount mostFrequent
0 3 [my, name, is]
1 4 [happy, birthday, to]
2 1 [la]
3 5 [you, are, that]

关于python - 获取每行列表中最常用的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66529091/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com