gpt4 book ai didi

python - 如何使用 reg.compile 匹配文本的精确单词

转载 作者:行者123 更新时间:2023-12-01 00:53:07 26 4
gpt4 key购买 nike

我想为这样的确切单词找到一个模式(WITH RE.COMPILE),

想象一下这样的词[aether、altitude、aphelion、west]

哪种捕获单词或带标点符号的单词,以一种我可以在spacy中使用它的方式,我使用了这个但它不起作用






regex_patterns = [

re.compile(r'aether?,|altitude?,|aphelion?,|apside?,|apsis?,|ascension?,|autumnal equinox?,|east?.|eastward?,|eclipse?,|ecliptic?,|elliptical?,|epicycle?,|equinoctical?,|exquinox?,|fixed star?,|latitude?,|longitude?s|mean ecliptic?,|meridian?,|mobile star?,|node?,|nodes?,|north?,|octant?,|orbit?,|\borbital?,|\bparallax?,|\brays?,|\bretrograde?,|rise?,|sidereal?,|sidereal position?,|solstice?,|south?,|star?,|vernal equinox?,|west?,')
]

如果正则表达式能够捕获“word”和“word,”(单词+标点符号),那就太好了就像这句话

“西边,可以看看”

结果应该是

西,

最佳答案

如果我们希望匹配特定的单词,我们可能希望从类似于以下的表达式开始:

(aether|altitude|aphelion|apside|apsis|ascension|autumnal equinox|east|eastward|eclipse|ecliptic|elliptical|epicycle|equinoctical|exquinox|fixed star|latitude|longitudes?|mean ecliptic|meridian|mobile star|nodes?|north|octant|orbit|\borbital\b|\bparallax\b|\brays\b|\bretrograde\b|rise|sidereal|sidereal position|solstice|south|star|vernal equinox|west),?

Demo 1

然后通过在 char 类中添加我们想要的标点符号来修改它:

[,:;\.]?

我们的表达可能会变成:

(aether|altitude|aphelion|apside|apsis|ascension|autumnal equinox|east|eastward|eclipse|ecliptic|elliptical|epicycle|equinoctical|exquinox|fixed star|latitude|longitudes?|mean ecliptic|meridian|mobile star|nodes?|north|octant|orbit|\borbital\b|\bparallax\b|\brays\b|\bretrograde\b|rise|sidereal|sidereal position|solstice|south|star|vernal equinox|west)[,:;\.]?

Demo 2

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(aether|altitude|aphelion|apside|apsis|ascension|autumnal equinox|east|eastward|eclipse|ecliptic|elliptical|epicycle|equinoctical|exquinox|fixed star|latitude|longitudes?|mean ecliptic|meridian|mobile star|nodes?|north|octant|orbit|\borbital\b|\bparallax\b|\brays\b|\bretrograde\b|rise|sidereal|sidereal position|solstice|south|star|vernal equinox|west),?"

test_str = ("aether\n"
"altitude\n"
"aphelion\n"
"apside\n"
"apsis\n"
"ascension\n"
"autumnal equinox\n"
"east?.\n"
"eastward\n"
"eclipse\n"
"ecliptic\n"
"elliptical\n"
"epicycle\n"
"equinoctical\n"
"exquinox\n"
"fixed star\n"
"latitude\n"
"longitude\n"
"longitudes\n"
"mean ecliptic\n"
"meridian\n"
"mobile star\n"
"node\n"
"nodes\n"
"north\n"
"octant\n"
"orbit\n"
"orbital\n"
"parallax\n"
"rays\n"
"retrograde\n"
"rise\n"
"sidereal\n"
"sidereal position\n"
"solstice\n"
"south\n"
"star\n"
"vernal equinox\n"
"west")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1

print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

关于python - 如何使用 reg.compile 匹配文本的精确单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56431229/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com