gpt4 book ai didi

python - 如何获得每组超过 X 次的连续相同单词的平均值?

转载 作者:行者123 更新时间:2023-12-04 08:56:48 25 4
gpt4 key购买 nike

我的问题类似于那个问题:
How to get average of same word more than X time per group?
但在这里,我想得到相同单词的平均值 连续每组(组 = name)超过 4 次。
例子:

id | name | sentences
---------------------
1 | aa | david hi david david david
2 | aa | david david is at home
3 | bb | I'm king
4 | cc | where r u going
5 | dd | lol lol lol lol lol lol
6 | ee | abc abc cc abc abc abc abc cc
7 | ee | dd dd dd ee dd dd dd
我想得到以下结果:
name | avg
----------
aa | 0.0 (0 sentence contain the words 'david' continuously 4 times in ). total instances of 'aa' group is 2
bb | 0.0 (0 sentence contains same word continuously 4 times)
cc | 0.0 (0 sentence contains same word continuously 4 times)
dd | 1.0 (1 sentence contains same word 'lol' continuously 4 times). total instances of 'dd' group is 1
ee | 0.5 (1 sentence contains same word 'abc' continuously 4 times). total instances of 'dd' group is 2


I'm using python 3.6.8

最佳答案

您可以计算连续出现的单词 4或多次使用 Series.str.count 然后使用 Series.groupby 组系列cntname并使用 mean 进行聚合获得分组平均值。

cnt = df['sentences'].str.count(r'(\w+)(\s\1){3,}')
avg = cnt.groupby(df['name']).mean().reset_index(name='avg')
详情:
print(cnt)
0 0
1 0
2 0
3 0
4 1
5 1
6 0
Name: sentences, dtype: int64

print(avg)
name avg
0 aa 0.0
1 bb 0.0
2 cc 0.0
3 dd 1.0
4 ee 0.5

关于python - 如何获得每组超过 X 次的连续相同单词的平均值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63778786/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com