gpt4 book ai didi

python - beautifulsoup 检索日期

转载 作者:行者123 更新时间:2023-11-30 23:16:31 29 4
gpt4 key购买 nike

我正在尝试从产品页面检索日期:http://www.homedepot.com/p/Husky-41-in-16-Drawer-Tool-Chest-and-Cabinet-Set -HOTC4016B1QES/205080371

但是日期隐藏在元信息中,请参见第一行:

<meta itemprop="datePublished" content="2014-11-27" />
</div><div id='80886327' itemprop="review" itemscope itemtype="http://schema.org/Review"><meta itemprop="itemReviewed" content="HUSKY 41 in. 16-Drawer Tool Chest and Cabinet Set" /><span itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating">Rated <span itemprop="ratingValue">5</span> out of <span itemprop="bestRating">5</span></span>Â by <span itemprop="author">Razor</span><span itemprop="name"> solid construction
</span><span itemprop="description"> I spent the last month checking and looking at all tool boxes that I could find. Online and at available stores. In comparison to all, this is by far the best deal for the money. Quality, workmanship and construction of this is by far the best for the money. Some I looked at are twice as much money for the same quality... I have had this approx. a month and filled with tools and shop stuff and with the ball bearing drawers loaded, does not make any difference on drawer operation. Granted we still need the test of time..

你们知道如何将这些日期保存到列表中吗?

最佳答案

您可以使用find_all()获取带有 itemprop="datePublished" 的所有 meta 标记:

import urllib2
from bs4 import BeautifulSoup

url = 'http://www.homedepot.com/p/Husky-41-in-16-Drawer-Tool-Chest-and-Cabinet-Set-HOTC4016B1QES/205080371'
soup = BeautifulSoup(urllib2.urlopen(url=url))

print [meta.get('content') for meta in soup.find_all('meta', itemprop='datePublished')]

打印:

[
'2014-11-27',
'2014-11-20',
'2014-12-15',
'2014-10-28',
'2014-10-10'
]

或者,使用CSS Selector :

print [meta.get('content') for meta in soup.select('meta[itemprop="datePublished"]')]

关于python - beautifulsoup 检索日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27716466/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com