gpt4 book ai didi

用于多行匹配的Python正则表达式命名组

转载 作者:太空宇宙 更新时间:2023-11-03 15:08:05 26 4
gpt4 key购买 nike

我有这样的文字

Alabama[STATE]
Auburn (Auburn University)[14]
Florence (University of North Alabama)
Huntsville (University of Alabama, Huntsville)
Jacksonville (Jacksonville State University)[15]
Livingston (University of West Alabama)[15]
Montevallo (University of Montevallo)[15]
Troy (Troy University)[15]
Tuskegee (Tuskegee University)[18]
Alaska[STATE]
Fairbanks (University of Alaska Fairbanks)[15]
Arizona[STATE]
Flagstaff (Northern Arizona University)[19]
Prescott (Embry–Riddle Aeronautical University)
Tempe (Arizona State University)

我正在尝试使用 python 正则表达式将州和大学列表读入两个命名组。我的代码是

UNIV_LIST = r"(?P<state>(\w)+)\[.*\n(?P<region>(.*?).*)"
RE_COMMIT = re.compile(UNIV_LIST)
text = open(UFILE).read()
each_group = RE_COMMIT.finditer(text)
for rc in each_group:
state = rc.groups()[0]
regions = rc.groups()[1]
print ('State is %s' %(state))
print ('regions are %s' %(regions))

预期输出为

State is : Alabama
Regions are : Auburn (Auburn University)[14]
Florence (University of North Alabama)
Huntsville (University of Alabama, Huntsville)
Jacksonville (Jacksonville State University)[15]
Troy (Troy University)[15]
Tuskegee (Tuskegee University)[18]
State is : Alaska
Regions are : Fairbanks (University of Alaska Fairbanks)[15]
State is : Arizona
Regions are : Flagstaff (Northern Arizona University)[19]
Prescott (Embry–Riddle Aeronautical University)
Tempe (Arizona State University)

但是当前的输出是

UNIV_LIST = r"(?P<state>(\w+))\[edit\]\n(?P<region>(.*))\n+")

State is Alabama
regions are Auburn (Auburn University)[1]
State is Alaska
regions are Fairbanks (University of Alaska Fairbanks)[2]
State is Arizona
regions are Flagstaff (Northern Arizona University)[6]

关于如何正确获取区域命名组有什么建议吗?

[编辑]实际文本是

Alabama[STATE]
Auburn (Auburn University)[14]
Florence (University of North Alabama)
Huntsville (University of Alabama, Huntsville)
Jacksonville (Jacksonville State University)[15]
Livingston (University of West Alabama)[15]
Montevallo (University of Montevallo)[15]
Montgomery (Alabama State University, Huntingdon College, Auburn University at Montgomery, H. Councill Trenholm State Technical College, Faulkner University)
Troy (Troy University)[15]
Tuscaloosa (University of Alabama, Stillman College, Shelton State)[6][17]
Tuskegee (Tuskegee University)[18]
Alaska[STATE]
Fairbanks (University of Alaska Fairbanks)[15]
Arizona[STATE]
Flagstaff (Northern Arizona University)[19]
Prescott (Embry–Riddle Aeronautical University)
Tempe (Arizona State University)
Tucson (University of Arizona)
Arkansas
Arkadelphia (Henderson State University, Ouachita Baptist University)[15]
Conway (Central Baptist College, Hendrix College, University of Central Arkansas)[15]
Fayetteville (University of Arkansas)[20]
Jonesboro (Arkansas State University)[21]
Magnolia (Southern Arkansas University)[15]
Monticello (University of Arkansas at Monticello)[15]
Russellville (Arkansas Tech University)[15]
Searcy (Harding University)[18]
California[STATE]

下面的正则表达式:

UNIV_LIST = r"(?P<state>^(\w+\[STATE\]))\r?\n?(?P<region>((^[^[]+)(\[\d+\])?(?!\[STATE\])$\r?\n?)+)"

提供了大部分预期结果,但缺少一些区域

State is : Alabama
Regions are : Auburn (Auburn University)[14]
Florence (University of North Alabama)
Huntsville (University of Alabama, Huntsville)
Jacksonville (Jacksonville State University)[15]
Livingston (University of West Alabama)[15]
Montevallo (University of Montevallo)[15]
Montgomery (Alabama State University, Huntingdon College, Auburn University at Montgomery, H. Councill Trenholm State Technical College, Faulkner University)
Troy (Troy University)[15]
Tuscaloosa (University of Alabama, Stillman College, Shelton State)[6][17]
Tuskegee (Tuskegee University)[18]
State is : Alaska
Regions are : Fairbanks (University of Alaska Fairbanks)[15]
State is : Arizona
Regions are : Flagstaff (Northern Arizona University)[19]
Prescott (Embry–Riddle Aeronautical University)
Tempe (Arizona State University)

我得到了结果,但是

Montgomery (Alabama State University, Huntingdon College, Auburn University at Montgomery, H. Councill Trenholm State Technical College,     Faulkner University)
Tuscaloosa (University of Alabama, Stillman College, Shelton State)[6][17]
Tuskegee (Tuskegee University)[18]

失踪了。对哪里出了问题有什么建议吗?

[编辑]

UNIV_LIST = r"(?P<state>^(\w+\s*\w*\[edit\]))\r?\n?(?P<region>((^[^[]+)(\[\d+\]){0,}?(?!\[edit\])$\r?\n?)+)"

这处理带有两个单词的州,例如新墨西哥州。但有一种情况仍然失败

Pomona (Cal Poly Pomona, WesternU)[9][10][11] and formerly Pomona College

最佳答案

以下正则表达式有效。

UNIV_LIST = r"(?P<state>^(\w+\[STATE\]))\r?\n?(?P<region>((^[^[]+)(\[\d+\]){0,}?(?!\[STATE\])$\r?\n?)+)"
RE_COMMIT = re.compile(UNIV_LIST,re.IGNORECASE | re.MULTILINE)
each_group = RE_COMMIT.finditer(text)
for rc in each_group:
print('State is : %s' %(rc.group('state')))
print('Region are : %s' %rc.group('region'))
print('-'*40)

输出

State is : Alabama[STATE]
Region are : Auburn (Auburn University)[14]
Florence (University of North Alabama)
Huntsville (University of Alabama, Huntsville)
Jacksonville (Jacksonville State University)[15]
Livingston (University of West Alabama)[15]
Montevallo (University of Montevallo)[15]
Troy (Troy University)[15]
Tuskegee (Tuskegee University)[18]

----------------------------------------
State is : Alaska[STATE]
Region are : Fairbanks (University of Alaska Fairbanks)[15]

----------------------------------------
State is : Arizona[STATE]
Region are : Flagstaff (Northern Arizona University)[19]
Prescott (Embry–Riddle Aeronautical University)
Tempe (Arizona State University)
----------------------------------------

关于用于多行匹配的Python正则表达式命名组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44469094/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com