gpt4 book ai didi

Issue with extracting data from javascript generated html doc(从javascript生成的html文档中提取数据的问题)

转载 作者:bug小助手 更新时间:2023-10-22 16:38:50 28 4
gpt4 key购买 nike



I'm trying to parse info from this page https://fem.encar.com/cars/detail/35902422?wtClick_index=187&conType=pctom
The data I need is in the following part of html:

我正在尝试分析此页面中的信息https://fem.encar.com/cars/detail/35902422?wtClick_index=187conType=pctom我需要的数据在html的以下部分:


<span class="DetailSummary_num_graph__oN21B">
<span>82%</span>
</span>

I need to get this 82%.

我需要得到这个82%。


I've saved html file with following function:

我用以下函数保存了html文件:


async def discount(folder):
url = f"https://fem.encar.com/cars/detail/35902422?wtClick_index=187&conType=pctom"
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36",
}

async with aiohttp.ClientSession() as session:
async with session.get(url=url, headers=headers) as response:
data = await response.text()
if not os.path.exists(folder):
os.makedirs(folder)
with open(f"{folder}\html.html", "w", encoding="utf8") as file:
file.write(data)

However, the html doc saved doesn't have the info I need which I see on the browser. Please help me find this data in json or other type of files on this webpage

然而,保存的html文档没有我在浏览器上看到的所需信息。请帮助我在此网页上的json或其他类型的文件中找到这些数据


更多回答

while <span class="DetailSummary_num_graph__oN21B" does exist in that document, that span does not contain a span, it just contains -% - anyway, if the data is retrieved dynamically, then you'll need a browser to view the dynamic data - clearly that site is made to deter web scrapping

虽然

优秀答案推荐
更多回答

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com