gpt4 book ai didi

Python 将 n-gram 从字典匹配到文本字符串

转载 作者:行者123 更新时间:2023-11-28 22:53:38 28 4
gpt4 key购买 nike

我有一个包含 2 个和 3 个单词的短语的字典,我想在 rss 提要中搜索它们以进行匹配。我抓取 rss 提要,对其进行处理,它们最终以字符串形式出现在名为“文档”的列表中。我想检查下面的字典,如果字典中的任何短语与文本字符串的一部分匹配,我想返回键的值。我不确定解决此问题的最佳方法。任何建议将不胜感激。

ngramList = {"cash outflows":-1, "pull out":-1,"winding down":-1,"most traded":-1,"steep gains":-1,"military strike":-1,
"resumed operations":+1,"state aid":+1,"bail out":-1,"cut costs":-1,"alleged violations":-1,"under perform":-1,"more than expected":+1,
"pay more taxes":-1,"not for sale":+1,"struck a deal":+1,"cash flow problems":-2}

最佳答案

我假设该词典中的数字 (-2、-1、+1) 是权重,因此您需要对每个文档中的每个短语进行计数才能使它们有用。

所以这样做的伪代码是:

  1. 将文档拆分为一个行列表,然后将每一行拆分为一个单词列表。
  2. 然后遍历一行中的每个单词,在行中向前和向后循环以生成各种短语。
  3. 随着每个短语的生成,保存一个全局字典,其中包含短语和出现次数。

下面是一些简单的代码,用于查找文档中每个短语的计数,这似乎是您想要做的:

text = """
I have a dictionary of 2 and 3 word phrases that I want to search in rss feeds for a match.

I grab the rss feeds, process them and they end up as a string IN a list entitled "documents".
I want to check the dictionary below and if any of the phrases in the dictionary match part of a string of text I want to return the values for the key.
I am not sure about the best way to approach this problem. Any suggestions would be greatly appreciated.
"""

ngrams = ["grab the rss", "approach this", "in"]

import re

counts = {}
for ngram in ngrams:
words = ngram.rsplit()
pattern = re.compile(r'%s' % "\s+".join(words),
re.IGNORECASE)
counts[ngram] = len(pattern.findall(text))

print counts

输出:

{'grab the rss': 1, 'approach this': 1, 'in': 5}

关于Python 将 n-gram 从字典匹配到文本字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19212556/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com