gpt4 book ai didi

Python RegEx findall 没有响应

转载 作者:行者123 更新时间:2023-11-30 21:53:28 32 4
gpt4 key购买 nike

我刚刚遇到了一件奇怪的事情。我正在使用 Open ANC 进行文本爬行原型(prototype)设计作为语料库。

在某些文本中,re 模块只是没有响应。如果有人可以肯定 re 模块可以处理正则表达式的复杂性,我就很好。

正则表达式是前面的(?:[^A-Za-z0-9\n\r]*\w+[^A-Za-z0-9\n\r]*)+获得的

出现问题的文本是:

My claim is that Lincoln’s address expresses the same idea that was then current in Europe. Each people of common history and language constitutes a nation, and the natural form for the nation’s survival was in a state structure. The idea that Americans constituted an organic national unit explained, implicitly, why the eleven Southern states could not go their own way. As he assumed the presidency, Lincoln still spoke of the Union rather than a nation; but in the course of the debates in the decades immediately preceding, the notion of union had acquired the metaphysical qualities of nationhood. In his first inaugural address, Lincoln invoked the “bonds of affection,” and even before shots were fired on Fort Sumter in Charleston Harbor, he stressed the unbreakable ties of historical struggle:

产生问题的Python代码:

import re

txt = "post text here"
regex = r"preceding(?:[^A-Za-z0-9\n\r]*\w+[^A-Za-z0-9\n\r]*)+acquired"
re.findall(regex, txt)

最佳答案

您的模式受到 catastrophic backtracking 的影响.

这是一种适合您的输入的替代模式:

regex = r"preceding[^A-Za-z0-9\n\r]+(?:\w+[^A-Za-z0-9\n\r]+)+?acquired"

这假设必须始终有至少一个非单词字符来分隔单词(否则它只会匹配一个长的、完整的单词)。

(另请参阅:How can I recognize an evil regex?)

关于Python RegEx findall 没有响应,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59668935/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com