gpt4 book ai didi

python - 用于网络抓取的 Beautifulsoup 不起作用?

转载 作者:太空宇宙 更新时间:2023-11-04 14:53:14 25 4
gpt4 key购买 nike

我正在尝试从网站上抓取一些数据。这是html格式。我想抓取 “No description for 632930413867”这个词。

HTML 代码:

<div class="col-xs-6 col-sm-6 col-md-6 col-lg-6">
<table class="table product_info_table">
<tbody>
<tr>
<td>GS1 Address</td>
<td>R.R. 1, Box 2, Malmo, NE 68040</td>
</tr>
<tr>
<td>Description</td>
<td>
<div id="read_desc">
No description for 632930413867
</div>
</td>
</tr>
</tbody>
</table>
</div>

和来自这个 html 的图像 src

  <div class="centered_image header_image">
<img src="https://images-na.ssl-images-amazon.com/images/I/416EuOE5kIL._SL160_.jpg" title="UPC 632930413867" alt="UPC 632930413867">

所以我用这个代码

Baseurl = "https://www.buycott.com/upc/632930413867"
uClient = ''
while uClient == '':
try:
uClient = requests.get(Baseurl)
print("Relax we are getting the data...")

except:
print("Connection refused by the server..")
print("Let me sleep for 7 seconds")
time.sleep(7)
print("Was a nice sleep, now let me continue...")
continue


page_html = uClient.content

uClient.close()
page_soup = soup(page_html, "html.parser")

Productcontainer = page_soup.find_all("div", {"class": "row"})
link = page_soup.find(itemprop="image")

print(Productcontainer)

for item in Productcontainer:
print(link)
productdescription = Productcontainer.find("div", {"class": "product_info_table"})
print(productdescription)

当我运行这段代码时,没有显示任何数据。如何获取描述和img src?

最佳答案

页面上每个(项目和产品描述)只有一个实例,因此您可以直接使用 find() 转到它们,在这种情况下无需使用 find_all():

import requests
from bs4 import BeautifulSoup as soup

Baseurl = "https://www.buycott.com/upc/632930413867"
uClient = ''
while uClient == '':
try:
uClient = requests.get(Baseurl)
print("Relax we are getting the data...")

except:
print("Connection refused by the server..")
print("Let me sleep for 7 seconds")
time.sleep(7)
print("Was a nice sleep, now let me continue...")
continue

page_html = uClient.content
uClient.close()

page_soup = soup(page_html, "html.parser")
productdescription = page_soup.find("div", {"id": "read_desc"}).text
link = page_soup.find("div", {"class": "centered_image header_image"}).find("img")['src']
print (productdescription)
print (link)

输出:

Relax we are getting the data...

No description for 632930413867

https://images-na.ssl-images-amazon.com/images/I/416EuOE5kIL._SL160_.jpg

关于python - 用于网络抓取的 Beautifulsoup 不起作用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47487376/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com