gpt4 book ai didi

python - 使用正则表达式反向搜索

转载 作者:太空宇宙 更新时间:2023-11-03 15:05:01 24 4
gpt4 key购买 nike

我有以下字符串

{"$deletedFields":["day"],"month":8,"year":2003,"$type":"com.linkedin.common.Date","$id":"urn:li:fs_position:(ACoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,164599768),timePeriod,startDate"},

我想要的是使用反向搜索来获取月份和年份。

key = 'ACoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,164599768'

实际上,我正在从文件中获取数据,因此它们的 key 是我区分每个数据的唯一希望。

我已经完成了正向正则表达式,但我想沿相反的方向搜索。可以说

re.findall(r''+key+'.*?),\$deletedFields', page_html)

就像它有一些否定或反对,所以它会抓取数据直到 $deletedFields

我不想使用reversed字符串来执行此操作,这会改变整个文件。

所需的输出

年份:2003年,月份:8

最佳答案

编辑:
键的顺序不同,所以我只想朝相反的方向搜索,直到 $deletedfield
重新阅读您的问题后,您似乎不知道
记录的开头是。

例如,如果您有一个总体的开始和明确的结束,则不会
最好指定一个公共(public)记录开始,然后匹配任何内容,直到
key,这将从第一次开始一直到 key,可能
在此过程中捕获其他 key 。

但是,您仍然可以通过每次遇到时重置开始来向前搜索
一个新的。

这使用了无序和可选的日期部分。它还捕获了 key
如果需要的话。

另一个功能是,您可以通过添加交替中的所有键来将所有键和日期包含并获取到记录数组中。

因此,正则表达式模型是 $deletedfield + 日期部分 + 任何这些键
并确保我们不会同时突破记录边界。

(?s)"\$deletedFields":(?:"day":(?P<day>\d+),|"month":(?P<month>\d+),|"year":(?P<year>\d+),|(?!"\$deletedFields":).)*?(?P<key>ACoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,164599768|BCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,264599768|CCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,364599768|DCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,464599768)

展开

 (?s)                          # Dot-All modifier

"\$deletedFields": # Beginning of record
(?:
"day":
(?P<day> \d+ ) # (1), day
,
| # or,
"month":
(?P<month> \d+ ) # (2), month
,
| # or,
"year":
(?P<year> \d+ ) # (3), year
,
| # or,
(?! "\$deletedFields": ) # any character, but not the beginning of record
.
)*?

(?P<key> # (4 start), Keys to find
ACoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,164599768
| BCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,264599768
| CCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,364599768
| DCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,464599768
) # (4 end)

Python
http://rextester.com/XXH80293

import re

str = (
r'{"$deletedFields":"month":2,"year":2003,"$type":"com.linkedin.common.Date","$id":"urn:li:fs_position:(ACoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,164599768),timePeriod,startDate"},' + "\n"
r'{"$deletedFields":"month":12,"year":2001,"$type":"com.linkedin.common.Date","$id":"urn:li:fs_position:(DCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,464599768),timePeriod,startDate"},' + "\n"
r'{"$deletedFields":"month":6,"year":2012,"$type":"com.linkedin.common.Date","$id":"urn:li:fs_position:(BCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,264599768),timePeriod,startDate"},' + "\n"
r'{"$deletedFields":"day":30,"month":8,"year":2009,"$type":"com.linkedin.common.Date","$id":"urn:li:fs_position:(CCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,364599768),timePeriod,startDate"},' + "\n"
)
keys = ['ACoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,164599768',
'BCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,264599768',
'CCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,364599768',
'DCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,464599768']

rx_keys = '(' + '|'.join( keys ) + ')'

Rx = r'(?s)"\$deletedFields":(?:"day":(?P<day>\d+),|"month":(?P<month>\d+),|"year":(?P<year>\d+),|(?!"\$deletedFields":).)*?' + rx_keys
key = 'ACoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,164599768'

print re.findall( Rx, str)

输出

[('', '2', '2003', 'ACoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,164599768'), ('', '12', '2001', 'DCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,464599768'), ('', '6', '2012', 'BCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,264599768'), ('30', '8', '2009', 'CCoAAAGiKv0BjXc8aE9HZLXpUnNcxQD4CoB1mKg,364599768')]

关于python - 使用正则表达式反向搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44741826/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com