gpt4 book ai didi

python - MCQ 类型字符串的正则表达式

转载 作者:行者123 更新时间:2023-12-04 01:21:06 25 4
gpt4 key购买 nike

如何从文本文档中提取多项选择题及其选项。每个问题都以数字和点开头。每个问题都可以跨越多行,并且可能/可能没有句号或问号。我想制作一本包含问题编号以及相应问题和选项的字典。我正在为此使用 python。

17.
If you go on increasing the stretching force on a wire in a
guitar, its frequency.
(a)
increases
(b)
decreases
(c)
remains unchanged
(d)
None of these

some random text between questions
18.
A vibrating body
(a)
will always produce sound
(b)
may or may not produce sound if the amplitude of
vibration is low
(c)
will produce sound which depends upon frequency
(d)
None of these
19.
The wavelength of infrasonics in air is of the order of
(a)
100 m
(b)
101 m
(c)
10–1 m
(d)
10–2 m

最佳答案

解决方案

假设您的问题来自questions.txt文件。

17.
If you go on increasing the stretching force on a wire in a
guitar, its frequency.
(a)
increases
(b)
decreases
(c)
remains unchanged
(d)
None of these

some random text between questions
18.
A vibrating body
(a)
will always produce sound
(b)
may or may not produce sound if the amplitude of
vibration is low
(c)
will produce sound which depends upon frequency
(d)
None of these
19.
The wavelength of infrasonics in air is of the order of
(a)
100 m
(b)
101 m
(c)
10–1 m
(d)
10–2 m

要解析的 Python 代码 questions.txt根据要求。

import re

filename = 'questions.txt'
questions = []

with open(file=filename, mode='r', encoding='utf8') as f:
lines = f.readlines()

is_label = False # means matched: 17.|(a)|(b)|(c)|(d)
statement = option_a = option_b = option_c = option_d = ''

for line in lines:
if re.match(r'^\d+\.$', line):
is_statement = is_label = True
is_option_a = is_option_b = is_option_c = is_option_d = False
elif re.match(r'^\(a\)$', line):
is_option_a = is_label = True
is_statement = is_option_b = is_option_c = is_option_d = False
elif re.match(r'^\(b\)$', line):
is_option_b = is_label = True
is_statement = is_option_a = is_option_c = is_option_d = False
elif re.match(r'^\(c\)$', line):
is_option_c = is_label = True
is_statement = is_option_a = is_option_b = is_option_d = False
elif re.match(r'^\(d\)$', line):
is_option_d = is_label = True
is_statement = is_option_a = is_option_b = is_option_c = False
else:
is_label = False

if is_label:
continue

if is_statement:
statement += line
elif is_option_a:
option_a = line.rstrip()
elif is_option_b:
option_b = line.rstrip()
elif is_option_c:
option_c = line.rstrip()
elif is_option_d:
option_d = line.rstrip()

if statement:
questions.append({
'statement': statement.rstrip(),
'options': [option_a, option_b, option_c, option_d]
})
statement = option_a = option_b = option_c = option_d = ''

print(questions)

输出

[
{
"statement": "If you go on increasing the stretching force on a wire in a\nguitar, its frequency.",
"options": [
"increases",
"decreases",
"remains unchanged",
"None of these"
]
},
{
"statement": "A vibrating body",
"options": [
"will always produce sound",
"vibration is low",
"will produce sound which depends upon frequency",
"None of these"
]
},
{
"statement": "The wavelength of infrasonics in air is of the order of",
"options": [
"100 m",
"101 m",
"10–1 m",
"10–2 m"
]
}
]

旁注

  • 类似some random text between questions 的文字被忽略
  • 多行语句的问题保持原样(意味着,有意不删除换行符)。您可以选择替换\n<space>性格。

关于python - MCQ 类型字符串的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62646625/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com