gpt4 book ai didi

Python - 如何使用带有随机类字符的 soup

转载 作者:行者123 更新时间:2023-12-01 06:53:42 26 4
gpt4 key购买 nike

所以我一直在试图弄清楚如何抓取一个购买/销售网站的网站,我发现了 HTML 中的所有内容,但该类包含不同的随机数,例如:

<div aria-label="Adidas NMD x Bape" class="styled__Wrapper-sc-1kpvi4z-0 eDiSuB" to="/annons/skane/adidas_nmd_x_bape/87267675">
<article class="styled__Article-sc-1kpvi4z-1 hbWRzz">
<div class="styled__ImageWrapper-sc-1kpvi4z-4 kxhCJn">
<div class="ListImage__Wrapper-sc-1rp77jc-0 cvipJS"><img alt="Adidas NMD x Bape" class="ListImage__StyledImg-sc-1rp77jc-1 iwClwW" sizes="
(min-width: 768px) 180px,
120px
" src="https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big" srcset="
https://cdn.blocket.com/pictures/1692451915.jpg?type=thumb 120w,
https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big 180w,
https://cdn.blocket.com/pictures/1692451915.jpg?type=mob_iphone_vi_normal 240w,
https://cdn.blocket.com/pictures/1692451915.jpg?type=store_presentation 360w,
https://cdn.blocket.com/pictures/1692451915.jpg?type=mob_iphone_vi_normal_retina 540w,
" /></div>
</div>
<div class="styled__Content-sc-1kpvi4z-2 dwtNsH">
<div class="styled__LocationTimeWrapper-sc-1kpvi4z-17 dvvNDw">
<div class="styled__SubjectSymbol-sc-1kpvi4z-11 cbBbUz"></div>
<p class="styled__TopInfoWrapper-sc-1kpvi4z-22 kEcJNb"><a class="Link-sc-139ww1j-0 TopInfoLink__StyledLink-lzfj8j-0 bjnLor" href="/annonser/hela_sverige/personligt/klader_skor?cg=4080&amp;q=bape&amp;st=s">Kläder &amp; skor</a> · <a class="Link-sc-139ww1j-0 TopInfoLink__StyledLink-lzfj8j-0 bjnLor" href="/annonser/skane/personligt/klader_skor?cg=4080&amp;q=bape&amp;r=23&amp;st=s">Skåne</a></p>
<p class="styled__Time-sc-1kpvi4z-18 bGSnhf">Idag 14:06</p>
</div>
<div class="styled__SubjectWrapper-sc-1kpvi4z-10 kZyTSM">
<h2 class="TextSubHeading__TextSubHeadingWrapper-sc-1ilszdp-0 jIvScq styled__StyledTitle-sc-1kpvi4z-6 bSElwy"><a class="Link-sc-139ww1j-0 styled__StyledTitleLink-sc-1kpvi4z-7 edlhAW" href="/annons/skane/adidas_nmd_x_bape/87267675">Adidas NMD x Bape</a></h2></div>
<div class="styled__ParamsWrapper-sc-1kpvi4z-13 cRZIFG"></div>
<div class="styled__SalesInfo-sc-1kpvi4z-20 bbHjGJ">
<div class="TextSubHeading__TextSubHeadingWrapper-sc-1ilszdp-0 jIvScq Price__Wrapper-sc-1v2maoc-0 heunWX"><span>3 000 kr<div class="TextCallout2__TextCallout2Wrapper-sc-19qvftl-0 eERYUj Price__StyledVatPrice-sc-1v2maoc-1 hMWxAJ"></div></span></div>
</div>
</div>
</article>
</div>

我确实看到了我正在寻找的所有标签,例如:

    Adidas NMD x Bape
3 000 kr
Skåne
/annons/skane/adidas_nmd_x_bape/87267675
https://cdn.blocket.com/pictures/1692451915.jpg

我确实对汤以及如何刮基本知识有相当的了解,但是当谈到高级时,我就失去了理智,所以我在这里询问你们可以为我提供什么样的建议,让我能够如何做到这一点刮掉我正在寻找的那些值?

<小时/>

已更新

test = eachPart.select_one('h2[class^="TextSubHeading__TextSubHeadingWrapper"] >a').text
print(test)
print(eachPart.select_one('[aria-label="{}"] img[alt="{}"]'.format(test, test))['src'])
print(eachPart.select_one('h2[class^="TextSubHeading__TextSubHeadingWrapper"] >a')['href'])
print(eachPart.select_one('div[class^="TextSubHeading__TextSubHeadingWrapper"] >span').text)
for test in eachPart.select('p[class^="styled__TopInfoWrapper"] a')[1:]:
print(test.text)

最佳答案

首先识别父标签以找到主标签,然后找到所有子标签。使用CSS选择器更方便。

from bs4 import BeautifulSoup
html='''<div aria-label="Adidas NMD x Bape" caria-label="Adidas NMD x Bape"lass="styled__Wrapper-sc-1kpvi4z-0 eDiSuB" to="/annons/skane/adidas_nmd_x_bape/87267675">
<article class="styled__Article-sc-1kpvi4z-1 hbWRzz">
<div class="styled__ImageWrapper-sc-1kpvi4z-4 kxhCJn">
<div class="ListImage__Wrapper-sc-1rp77jc-0 cvipJS"><img alt="Adidas NMD x Bape" class="ListImage__StyledImg-sc-1rp77jc-1 iwClwW" sizes="
(min-width: 768px) 180px,
120px
" src="https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big" srcset="
https://cdn.blocket.com/pictures/1692451915.jpg?type=thumb 120w,
https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big 180w,
https://cdn.blocket.com/pictures/1692451915.jpg?type=mob_iphone_vi_normal 240w,
https://cdn.blocket.com/pictures/1692451915.jpg?type=store_presentation 360w,
https://cdn.blocket.com/pictures/1692451915.jpg?type=mob_iphone_vi_normal_retina 540w,
" /></div>
</div>
<div class="styled__Content-sc-1kpvi4z-2 dwtNsH">
<div class="styled__LocationTimeWrapper-sc-1kpvi4z-17 dvvNDw">
<div class="styled__SubjectSymbol-sc-1kpvi4z-11 cbBbUz"></div>
<p class="styled__TopInfoWrapper-sc-1kpvi4z-22 kEcJNb"><a class="Link-sc-139ww1j-0 TopInfoLink__StyledLink-lzfj8j-0 bjnLor" href="/annonser/hela_sverige/personligt/klader_skor?cg=4080&amp;q=bape&amp;st=s">Kläder &amp; skor</a> · <a class="Link-sc-139ww1j-0 TopInfoLink__StyledLink-lzfj8j-0 bjnLor" href="/annonser/skane/personligt/klader_skor?cg=4080&amp;q=bape&amp;r=23&amp;st=s">Skåne</a></p>
<p class="styled__Time-sc-1kpvi4z-18 bGSnhf">Idag 14:06</p>
</div>
<div class="styled__SubjectWrapper-sc-1kpvi4z-10 kZyTSM">
<h2 class="TextSubHeading__TextSubHeadingWrapper-sc-1ilszdp-0 jIvScq styled__StyledTitle-sc-1kpvi4z-6 bSElwy"><a class="Link-sc-139ww1j-0 styled__StyledTitleLink-sc-1kpvi4z-7 edlhAW" href="/annons/skane/adidas_nmd_x_bape/87267675">Adidas NMD x Bape</a></h2></div>
<div class="styled__ParamsWrapper-sc-1kpvi4z-13 cRZIFG"></div>
<div class="styled__SalesInfo-sc-1kpvi4z-20 bbHjGJ">
<div class="TextSubHeading__TextSubHeadingWrapper-sc-1ilszdp-0 jIvScq Price__Wrapper-sc-1v2maoc-0 heunWX"><span>3 000 kr<div class="TextCallout2__TextCallout2Wrapper-sc-19qvftl-0 eERYUj Price__StyledVatPrice-sc-1v2maoc-1 hMWxAJ"></div></span></div>
</div>
</div>
</article>
</div>'''
soup=BeautifulSoup(html,"html.parser")
print(soup.select_one('[aria-label="Adidas NMD x Bape"] img[alt="Adidas NMD x Bape"]')['src'])
print(soup.select_one('[aria-label="Adidas NMD x Bape"] h2[class^="TextSubHeading__TextSubHeadingWrapper"] >a').text)
print(soup.select_one('[aria-label="Adidas NMD x Bape"] h2[class^="TextSubHeading__TextSubHeadingWrapper"] >a')['href'])
print(soup.select_one('[aria-label="Adidas NMD x Bape"] div[class^="TextSubHeading__TextSubHeadingWrapper"] >span').text)

输出:

https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big
Adidas NMD x Bape
/annons/skane/adidas_nmd_x_bape/87267675
3 000 kr

编辑

from bs4 import BeautifulSoup
html='''<div aria-label="Adidas NMD x Bape" class="styled__Wrapper-sc-1kpvi4z-0 eDiSuB" to="/annons/skane/adidas_nmd_x_bape/87267675">
<article class="styled__Article-sc-1kpvi4z-1 hbWRzz">
<div class="styled__ImageWrapper-sc-1kpvi4z-4 kxhCJn">
<div class="ListImage__Wrapper-sc-1rp77jc-0 cvipJS"><img alt="Adidas NMD x Bape" class="ListImage__StyledImg-sc-1rp77jc-1 iwClwW" sizes="
(min-width: 768px) 180px,
120px
" src="https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big" srcset="
https://cdn.blocket.com/pictures/1692451915.jpg?type=thumb 120w,
https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big 180w,
https://cdn.blocket.com/pictures/1692451915.jpg?type=mob_iphone_vi_normal 240w,
https://cdn.blocket.com/pictures/1692451915.jpg?type=store_presentation 360w,
https://cdn.blocket.com/pictures/1692451915.jpg?type=mob_iphone_vi_normal_retina 540w,
" /></div>
</div>
<div class="styled__Content-sc-1kpvi4z-2 dwtNsH">
<div class="styled__LocationTimeWrapper-sc-1kpvi4z-17 dvvNDw">
<div class="styled__SubjectSymbol-sc-1kpvi4z-11 cbBbUz"></div>
<p class="styled__TopInfoWrapper-sc-1kpvi4z-22 kEcJNb"><a class="Link-sc-139ww1j-0 TopInfoLink__StyledLink-lzfj8j-0 bjnLor" href="/annonser/hela_sverige/personligt/klader_skor?cg=4080&amp;q=bape&amp;st=s">Kläder &amp; skor</a> · <a class="Link-sc-139ww1j-0 TopInfoLink__StyledLink-lzfj8j-0 bjnLor" href="/annonser/skane/personligt/klader_skor?cg=4080&amp;q=bape&amp;r=23&amp;st=s">Skåne</a></p>
<p class="styled__Time-sc-1kpvi4z-18 bGSnhf">Idag 14:06</p>
</div>
<div class="styled__SubjectWrapper-sc-1kpvi4z-10 kZyTSM">
<h2 class="TextSubHeading__TextSubHeadingWrapper-sc-1ilszdp-0 jIvScq styled__StyledTitle-sc-1kpvi4z-6 bSElwy"><a class="Link-sc-139ww1j-0 styled__StyledTitleLink-sc-1kpvi4z-7 edlhAW" href="/annons/skane/adidas_nmd_x_bape/87267675">Adidas NMD x Bape</a></h2></div>
<div class="styled__ParamsWrapper-sc-1kpvi4z-13 cRZIFG"></div>
<div class="styled__SalesInfo-sc-1kpvi4z-20 bbHjGJ">
<div class="TextSubHeading__TextSubHeadingWrapper-sc-1ilszdp-0 jIvScq Price__Wrapper-sc-1v2maoc-0 heunWX"><span>3 000 kr<div class="TextCallout2__TextCallout2Wrapper-sc-19qvftl-0 eERYUj Price__StyledVatPrice-sc-1v2maoc-1 hMWxAJ"></div></span></div>
</div>
</div>
</article>
</div>'''
soup=BeautifulSoup(html,"html.parser")
print(soup.select_one('[class^="styled__Wrapper-sc-"] img[class^="ListImage__StyledImg-sc-"]')['src'])
print(soup.select_one('[class^="styled__Wrapper-sc-"] h2[class^="TextSubHeading__TextSubHeadingWrapper"] >a').text)
print(soup.select_one('[class^="styled__Wrapper-sc-"] h2[class^="TextSubHeading__TextSubHeadingWrapper"] >a')['href'])
print(soup.select_one('[class^="styled__Wrapper-sc-"] div[class^="TextSubHeading__TextSubHeadingWrapper"] >span').text)

关于Python - 如何使用带有随机类字符的 soup,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58894968/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com