gpt4 book ai didi

python - BeautifulSoup 网络抓取 find_all() : custom function not working

转载 作者:行者123 更新时间:2023-12-01 23:27:47 24 4
gpt4 key购买 nike

所以我正在从这个网站上抓取 MCQ。我最后想要正确的选项。所有选项共享相同的 class='radio-button-click-target'。但正确的选项最后有 radio-button-click-target correctquestions。我试过 BeautifulSoup webscraping find_all( ): finding exact match solutioncustom function但现在没有任何选项出现。

import requests
from bs4 import BeautifulSoup
address = 'https://www.ilmkidunya.com/online-test/5th-class-science-english-meduim-mcqs-with-answers?startfrom=0&last=92'
response = requests.get(address)
soup = BeautifulSoup(response.text, 'lxml')
ques_id = soup.find_all('div', class_='q-title')
ques_det = soup.find_all('div', class_='q-desc')
optn_det = soup.find_all('div', class_='choose-answer-block')
for i in range(0, len(ques_id)):
print((ques_id[i].text))
print(str(ques_det[i].text).strip())
options = optn_det[i].find_all(lambda tag: tag.name == 'div' and tag.get('class') == ['radio-button-click-target correctquestions'])
for opn in options:
print(str(opn.text).strip())
print('<----->')

电流输出

Question #  1
The group which belong to invertebrates is.
amphibians
Worms
Reptiles
Mammals
<----->
Question # 2
The main cause of cholera is:
land polllution
noise pollution
air pollution
water pollution
<----->

预期输出

Question #  1
The group which belong to invertebrates is.
amphibians
Reptiles
Mammals
Worms
<----->
Question # 2
The main cause of cholera is:
land polllution
noise pollution
air pollution
water pollution
<----->

正确的选项应该显示在最后

最佳答案

您需要的一切都在 HTML 中,因此您可以通过获取所有问题、建议答案和正确答案来重建问题数据库。

方法如下:

import random
import time

import requests
from bs4 import BeautifulSoup

address = 'https://www.ilmkidunya.com/online-test/5th-class-science-english-meduim-mcqs-with-answers?startfrom=0&last=92'
soup = BeautifulSoup(requests.get(address).text, 'lxml')

question = [
q.getText(strip=True) for q
in soup.select("div.single-question-answer-block div.q-desc")
]

radio_buttons = [
o.getText(strip=True) for o in
soup.select("div.fancy-radio-box .radio-button:disabled + .radio-button-click-target")
]

correct_answers = [
a.getText(strip=True) for a in
soup.find_all(lambda t: t.name == "label" and "correctquestions" in t["class"])
]

options = [radio_buttons[i:i + 4] for i in range(0, len(radio_buttons), 4)]

trivia_base = list(zip(question, correct_answers, options))

question, correct_answer, answers = random.choice(trivia_base)
time_to_answer_in_seconds = 15

print(question.title())
print("\n".join(f"-> {a.title()}" for a in answers))
print("-" * len(question))

time.sleep(time_to_answer_in_seconds)
print(f"Correct answer is: {correct_answer}.")

示例输出:

Which Of The Following Objects Emits Light?
-> Earth
-> Sun
-> Moon
-> Pluto
-------------------------------------------
Correct answer is: Sun.

编辑:

如果你想一次打印所有的问题,使用这个:

trivia_base = list(zip(question, correct_answers, options))
horizontal_line = max(len(q) for q in question)

for number, trivia in enumerate(trivia_base, start=1):
question, correct_answer, answers = trivia
print(f"{number}. {question.title()}")
print("\n".join(f"-> {a.title()}" for a in answers))
print(f"Correct answer is: {correct_answer}.")
print("-" * horizontal_line)

输出:

1. Which Of The Following Objects Emits Light?
-> Earth
-> Sun
-> Moon
-> Pluto
Correct answer is: Sun.
----------------------------------------------------------------------------------------------------
2. The Group Which Belongs To Invertebrates Is:
-> Amphibians
-> Insects
-> Reptiles
-> Birds
Correct answer is: insects.
----------------------------------------------------------------------------------------------------
3. Number Of Petals In A Flower Of Dicot Plant May Be:
-> 3
-> 4
-> 6
-> 7
Correct answer is: 4.
----------------------------------------------------------------------------------------------------

and more...

关于python - BeautifulSoup 网络抓取 find_all() : custom function not working,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66929999/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com