gpt4 book ai didi

python - BeautifulSoup 在Python中提取没有类的值

转载 作者:行者123 更新时间:2023-12-01 06:42:47 25 4
gpt4 key购买 nike

我想在 Python 中使用 BeautifulSoup 提取数据。

我的文档:

<div class="listing-item" data-id="309531" data-score="0">

<div class="thumb">
<a href="https://res.cloudinary.com/">

<div style="background-image:url(https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2292,y_50/co_rgb:FFFFFF,l_text:oswald_100_bold_letter_spacing_5:01,y_-107/c_fit,w_200/abu-dhabi-plate_private-car_classic);"></div>
</a>
</div>
</div>

这里我想获取背景图片URL

<div style="background-image:url(https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2292,y_50/co_rgb:FFFFFF,l_text:oswald_100_bold_letter_spacing_5:01,y_-107/c_fit,w_200/abu-dhabi-plate_private-car_classic);"></div>

我的代码:

from textwrap import shorten
from bs4 import BeautifulSoup
from urllib.parse import parse_qsl, urljoin, urlparse
import requests

url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'

print('{:^50} {:^15} {:^25} '.format('Title', 'Pice', 'Date'))

for page in range(0, 40): # <--- Increase to number pages you want
response = requests.get(url.format(page))
soup = BeautifulSoup(response.text, 'lxml')

for title, price, date, thumb in zip(soup.select('.listing-item .title'),
soup.select('.listing-item .price'),
soup.select('.listing-item .date'),
soup.select('.listing-item .thumb')):

print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(), 50), price.get_text().strip(), thumb.get_text().strip()))

如何从文档中获取背景图片 URL?

最佳答案

您可以通过在您的 thumb 值中搜索来访问该网址。

你可以试试这个:

代码:

from textwrap import shorten
from bs4 import BeautifulSoup
from urllib.parse import parse_qsl, urljoin, urlparse
import requests

url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'

print('{:^50} {:^15} {:^25} '.format('Title', 'Price', 'Date'))

for page in range(0, 1): # <--- Increase to number pages you want
response = requests.get(url.format(page))
soup = BeautifulSoup(response.text, 'lxml')

for title, price, date, thumb in zip(soup.select('.listing-item .title'),soup.select('.listing-item .price'),soup.select('.listing-item .date'),soup.select('.listing-item .thumb')):
# url = thumb.find('div').get('style').split('url(')[1].split(');')[0])
print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(),50),price.get_text().strip(), thumb.find('div').get('style').split('url(')[1].split(');')[0]))

关于python - BeautifulSoup 在Python中提取没有类的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59371533/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com