gpt4 book ai didi

python - 从数据框列列表创建术语频率字典

转载 作者:太空宇宙 更新时间:2023-11-03 15:39:58 25 4
gpt4 key购买 nike

我有一个数据框,其中包含字符串列表作为一列,并且想要使用 collections.counter 创建术语频率字典。数据框如下所示:

>>> job_title['title']
0 [responsible, caring, trustworthy, babysitter]
1 [compassionate, trustworthy, babysitter]
2 [family, looking, kindergarten, preschool, chi...
3 [babysitter, needed, 2, children, bee, cave, n...
4 [fun, patient, nonjudgemental, babysitter]
5 [responsible, interactive, intelligent, babysi...
6 [responsible, friendly, babysitter]
7 [family, looking, kindergarten, preschool, chi...
8 [family, looking, kindergarten, preschool, chi...
9 [reliable, clean, friendly, nanny]

实现这一目标最有效的方法是什么?

最佳答案

我认为你可以通过chain.from_iterable来平展列表,然后使用Counter:

from  itertools import chain
from collections import Counter

print (Counter(chain.from_iterable(job_title.title)))

示例:

job_title = pd.DataFrame({'title':[['responsible', 'caring', 'trustworthy', 'babysitter'],
['compassionate', 'trustworthy', 'babysitter']]})

print (job_title)
title
0 [responsible, caring, trustworthy, babysitter]
1 [compassionate, trustworthy, babysitter]


print (Counter(chain.from_iterable(job_title.title)))
Counter({'babysitter': 2, 'trustworthy': 2,
'compassionate': 1, 'responsible': 1, 'caring': 1})

关于python - 从数据框列列表创建术语频率字典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42228448/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com