gpt4 book ai didi

python - 如何根据数据框中的单词检测分配点数/分数?

转载 作者:行者123 更新时间:2023-12-03 20:51:18 28 4
gpt4 key购买 nike

我是 Python 新手,正在尝试学习单词检测。我有一个带有单词的数据框

sharina['transcript']
Out[25]:
0 thank you for calling my name is Tiffany and we want to let you know this call is recorded...
1 Maggie
2 through the time
3 that you can find I have a question about a claim and our contact is..
4 three to like even your box box and thank you for your help...
我创建了一个应用程序,可以从中检测单词:
def search_multiple_strings_in_file(file_name, list_of_strings):
"""Get line from the file along with line numbers, which contains any string from the list"""
line_number = 0
list_of_results = []
# Open the file in read only mode
with open("sharina.csv", 'r') as read_obj:
# Read all lines in the file one by one
for line in read_obj:
line_number += 1
# For each line, check if line contains any string from the list of strings
for string_to_search in list_of_strings:
if string_to_search in line:
# If any string is found in line, then append that line along with line number in list
list_of_results.append((string_to_search, line_number, line.rstrip()))

# Return list of tuples containing matched string, line numbers and lines where string is found
return list_of_results

# search for given strings in the file 'sample.txt'

matched_lines = search_multiple_strings_in_file('sharina.csv', ['recorded','thank'])

print('Total Matched lines : ', len(matched_lines))
for elem in matched_lines:
print('Word = ', elem[0], ' :: Line Number = ', elem[1], ' :: Line = ', elem[2])
例如,如果在数据框中检测到某些单词,我想分配一个分数
如果提到了“记录”这个词 = 7 分
如果提到了“谢谢”这个词 = 5 分
然后在这种情况下,输出给出总分/分数 = 12 的总和。我怎样才能做到这一点?

最佳答案

既然你提到你已经有了一个 DataFrame:
这可以通过 Series.str.extractall 相对简单地完成。首先,我们创建捕获组,它是所有单词的 '|'.join,夹在括号之间。这允许您获取一个系列中的所有所需单词,该系列的索引表示它所属的行。该系列还有一个“匹配”索引级别,指示在该行上匹配的项目数,在这种情况下并不重要。

pat = '(' + '|'.join(words) + ')'
#'(recorded|thank)'

df['transcript'].str.extractall(pat)
# 0
# match
#0 0 thank # `'thank'` on line 0
# 1 recorded
#4 0 thank # `'thank'` also on line 4
如果要分配分数,一个好的组织就是字典,让关键字成为单词,值成为分数。然后你可以通过加入键来制作模式,并通过映射值来获得点:
d = {'thank': 5, 'recorded': 7}
pat = '(' + '|'.join(d.keys()) + ')'

df1 = df['transcript'].str.extractall(pat).rename(columns={0: 'word'})
df1['points'] = df1['word'].map(d)
# word points
# match
#0 0 thank 5
# 1 recorded 7
#4 0 thank 5
如果您只想计算一次单词,则 drop_duplicates:
df1.drop_duplicates('word').points.sum()
#12

设置数据
df = pd.DataFrame({'transcript': 
['thank you for calling my name is Tiffany and we want to let you know this call is recorded',
'Maggie',
'through the time',
'that you can find I have a question about a claim and our contact is',
'three to like even your box box and thank you for your help']})

关于python - 如何根据数据框中的单词检测分配点数/分数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62645923/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com