gpt4 book ai didi

python - 使用 RegEx 使用单词 'but' 对句子进行分块

转载 作者:行者123 更新时间:2023-11-28 17:04:24 27 4
gpt4 key购买 nike

我正在尝试在单词“但是”(或任何其他并列连词)处使用 RegEx 对句子进行分块。它不起作用...

sentence = nltk.pos_tag(word_tokenize("There are no large collections present but there is spinal canal stenosis."))
result = nltk.RegexpParser(grammar).parse(sentence)
DigDug = nltk.RegexpParser(r'CHUNK: {.*<CC>.*}')
for subtree in DigDug.parse(sentence).subtrees():
if subtree.label() == 'CHUNK': print(subtree.node())

我需要将句子 “There is no large collections present but there is spinal canal stenosis.” 分成两部分:

1. "There are no large collections present"
2. "there is spinal canal stenosis."

我还希望使用相同的代码在“and”和其他并列连词 (CC) 词处拆分句子。但是我的代码不起作用。请帮忙。

最佳答案

我想你可以简单地做

import re
result = re.split(r"\s+(?:but|and)\s+", sentence)

在哪里

`\s`        Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.)
`+` Between one and unlimited times, as many times as possible, giving back as needed (greedy)
`(?:` Match the regular expression below, do not capture
Match either the regular expression below (attempting the next alternative only if this one fails)
`but` Match the characters "but" literally
`|` Or match regular expression number 2 below (the entire group fails if this one fails to match)
`and` Match the characters "and" literally
)
`\s` Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.)
`+` Between one and unlimited times, as many times as possible, giving back as needed (greedy)

您可以在其中添加更多连词,用竖线字符 | 分隔。请注意,这些词不包含在正则表达式中具有特殊含义的字符。如果有疑问,请先使用 re.escape(word)

转义它们

关于python - 使用 RegEx 使用单词 'but' 对句子进行分块,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52014482/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com