python - 给定一个单词列表和一个句子，找到整个句子或作为子字符串出现在句子中的所有单词-6ren

python - 给定一个单词列表和一个句子，找到整个句子或作为子字符串出现在句子中的所有单词

转载作者：太空狗更新时间：2023-10-29 20:28:48

26

4

问题

给定一个字符串列表，从列表中找到出现在给定文本中的字符串。

示例

list = ['red', 'hello', 'how are you', 'hey', 'deployed']
text = 'hello, This is shared right? how are you doing tonight'
result = ['red', 'how are you', 'hello']

'red' 因为它有 'shared' 有 'red' 作为子串

这与 this question 非常相似除了我们需要查找的词也可以是子串。
列表非常大，并且随着用户的增加而增加，而不是整个长度几乎相同的文本。
我正在考虑有一个解决方案，其中时间复杂度取决于文本的长度而不是单词列表，这样即使添加了大量用户也可以扩展。

解决方案

我根据给定的单词列表构建一个 trie 树
对文本运行 dfs 并根据 trie 检查当前单词

伪代码

def FindWord (trie, text, word_so_far, index):
    index > len(text)
        return
    //Check if the word_so_far is a prefix of a key; if not return
    if trie.has_subtrie(word) == false:
       return 
    //Check if the word_so_far is a key; if ye add to result and look further 
    if trie.has_key(word) == false:
        // Add to result and continue
    //extend the current word we are searching
    FindWord (trie, text, word_so_far + text[index], index + 1)
    //start new from the next index 
    FindWord (trie, text, "", index + 1)

这个问题是虽然运行时现在依赖于 len(text) 它在构建 trie 之后以时间复杂度 O(2^n) 运行对于多个文本来说是一次性的事情，所以没关系。

我没有看到任何重叠的子问题来内存和改进运行时间。

你能建议我实现依赖于给定文本的运行时的任何方法，而不是可以按处理和缓存的单词列表，并且比这更快。

最佳答案

您尝试执行的操作的理论上合理的版本称为 Aho--Corasick .实现后缀链接有点复杂 IIRC，所以这里有一个只使用 trie 的算法。

我们一个字母一个字母地消费文本。在任何时候，我们都在可以遍历的 trie 中维护一组节点。最初这个集合只包含根节点。对于每个字母，我们遍历集合中的节点，如果可能的话通过新字母下降。如果结果节点匹配，很好，报告它。无论如何，把它放在下一组。下一组还包含根节点，因为我们可以随时开始新的匹配。

这是我在 Python 中快速实现的尝试(未经测试，无保证等)。

class Trie:
    def __init__(self):
        self.is_needle = False
        self._children = {}

    def find(self, text):
        node = self
        for c in text:
            node = node._children.get(c)
            if node is None:
                break
        return node

    def insert(self, needle):
        node = self
        for c in needle:
            node = node._children.setdefault(c, Trie())
        node.is_needle = True


def count_matches(needles, text):
    root = Trie()
    for needle in needles:
        root.insert(needle)
    nodes = [root]
    count = 0
    for c in text:
        next_nodes = [root]
        for node in nodes:
            next_node = node.find(c)
            if next_node is not None:
                count += next_node.is_needle
                next_nodes.append(next_node)
        nodes = next_nodes
    return count


print(
    count_matches(['red', 'hello', 'how are you', 'hey', 'deployed'],
                  'hello, This is shared right? how are you doing tonight'))

关于python - 给定一个单词列表和一个句子，找到整个句子或作为子字符串出现在句子中的所有单词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54203241/

26

4

0

文章推荐： python - 为什么提高精度会使这个程序更快？

文章推荐： python - 超越关键字依赖的文本分类并推断实际含义

grails - Cereal ，哥伦。按 parent 找 child ，按 child 找 parent
例如，我有一个父类Author: class Author { String name static hasMany = [ fiction: Book,
javascript - DOJO:找 child
代码如下: dojo.query(subNav.navClass).forEach(function(node, index, arr){ if(dojo.style(node, 'd
mysql - 加入两张 table 找 friend
我有一个带有 Id 和姓名的学生表和一个带有 Id 和 friend Id 的 Friends 表。我想加入这两个表并找到学生的 friend 。例如，Ashley 的 friend 是 Saman
grails - Grails按 child 找 parent
我通过互联网浏览，但仍未找到问题的答案。应该很容易: class Parent { String name Child child } 当我有一个 child 对象时，如何获得它的 paren
android - Firebase Android 找 friend 功能
我正在尝试创建一个以 Firebase 作为我的后端的社交应用。现在我正面临如何(在哪里？)找到 friend 功能的问题。我有每个用户的邮件地址。我可以访问用户的电话也预订。在传统的后端中，我
ios - Apple Game Center 和 Facebook 找 friend iOS6
我主要想澄清以下几点: 1。有人告诉我，在 iOS 5 及以下版本中，如果您使用 Game Center 设置多人游戏，则“查找 Facebook 好友”(如与好友争夺战)的功能不是内置的，因此您需要
docker - 有什么用!意思？找。\! -用户redis -exec chown redis '{}' +
关于redis docker镜像ENTRYPOINT脚本 docker-entrypoint.sh : #!/bin/sh set -e # first arg is `-f` or `--some-

首页

博学

6Ren·AI

商城