gpt4 book ai didi

python - 识别所有有问题的引号实例

转载 作者:行者123 更新时间:2023-12-01 00:26:29 24 4
gpt4 key购买 nike

我有一个(正确形成的)大字符串变量,我将其转换为字典列表。我迭代大量字符串,按换行符分割,然后运行以下 list(eval(i))。这适用于大多数情况,但对于抛出的每个异常,我都会将“格式错误”字符串添加到 failed_attempt 数组中。我现在已经检查失败的案例一个小时了,并且相信导致它们失败的原因是每当存在不属于字典键的额外引号时。例如,

eval('''[{"question":"What does "AR" stand for?","category":"DFB","answers":["Assault Rifle","Army Rifle","Automatic Rifle","Armalite Rifle"],"sources":["https://www.npr.org/2018/02/28/588861820/a-brief-history-of-the-ar-15"]}]''')

将失败,因为“AR”周围有引号。如果将引号替换为单引号,例如

eval('''[{"question":"What does 'AR' stand for?","category":"DFB","answers":["Assault Rifle","Army Rifle","Automatic Rifle","Armalite Rifle"],"sources":["https://www.npr.org/2018/02/28/588861820/a-brief-history-of-the-ar-15"]}]''')

现在成功了。

同样:

eval('''[{"question":"Test Question, Test Question?","category":"DFB","answers":["2004","1930","1981","This has never occurred"],"sources":[""SOWELL: Exploding myths""]}]''')

由于“Sowell”周围的引号而失败,但如果将它们替换为单引号,则会再次成功。

所以我需要一种方法来识别出现在字典键周围以外的任何地方的引号(questioncategorysources)并将它们替换为单引号。我不确定执行此操作的正确方法。

@Wiktor 的提交几乎成功了,但在以下情况下会失败:

example = '''[{"question":"Which of the following is NOT considered to be "interstate commerce" by the Supreme Court, and this cannot be regulated by Congress?","category":"DFB","answers":["ANSWER 1","ANSWER 2","ANSWER 3","All of these are considered "Interstate Commerce""],"sources":["SOURCE 1","SOURCE 2","SOURCE 3"]}]'''
re.sub(r'("\w+":[[{]*")(.*?)("(?:,|]*}))', lambda x: "{}{}{}".format(x.group(1),x.group(2).replace('"', "'"),x.group(3)), example)


Out[170]: '[{"question":"Which of the following is NOT considered to be \'interstate commerce\' by the Supreme Court, and this cannot be regulated by Congress?","category":"DFB","answers":["ANSWER 1","ANSWER 2","ANSWER 3","All of these are considered "Interstate Commerce""],"sources":["SOURCE 1","SOURCE 2","SOURCE 3"]}]'

请注意,答案中“Interstate Commerce”的第二组双引号不会被替换。

最佳答案

试试这个,我知道这对所有人都有效 questioncategory键值,我希望我没有忘记 lists 的任何大小写值:

import re


def escape_quotes(match):
""" espace normal quotes captured by the second group."""
# match any quote except this quotes : `["` or `","` or `"]`
RE_ESACEP_QUOTES_IN_LIST = re.compile('(?<!\[)(?<!",)"(?!,"|\])')

def escape_quote_in_string(string):
return '"{}"'.format(string[1:-1].replace('"', "'"))

key, value = match.groups()
# this will fix for sure the problem related to this keys
if any(e in key for e in ('question', 'category')):
value = escape_quote_in_string(value)
if any(e in key for e in ('answers', 'sources')):
# keep only [" or "," or "] escape any thing else
value = RE_ESACEP_QUOTES_IN_LIST.sub(r"'", value)

return f'{key}{value}'


# test cases
exps = ['''[{"question":"What does "AR" stand for?"}]''',
'''[{"sources":[""SOWE"LL: Ex"ploding myths""]}]''',
'''[{"question":"Test ", Test" Que"sti"on?","sources":[""SOWELL: Ex""ploding myths""]}]''']

# extract key value of the expression you made it easy by specifying that key are fixed
key = '(?:"(?:question|category|answers|sources)":)'
RE_KEY_VALUE = re.compile(rf'({key})(.+?)\s*(?=,\s*{key}|}})', re.S)

# test all cases
for exp in exps:
# escape normal quotes
exp = RE_KEY_VALUE.sub(escape_quotes, exp)
print(eval(exp))

# [{'question': "What does 'AR' stand for?"}]
# [{'sources': ["'SOWE'LL: Ex'ploding myths'"]}]
# [{'question': "Test ', Test' Que'sti'on?", 'sources': ["'SOWELL: Ex''ploding myths'"]}]

关于python - 识别所有有问题的引号实例,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58559827/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com