gpt4 book ai didi

Python 网页抓取;美汤

转载 作者:行者123 更新时间:2023-11-28 23:04:12 25 4
gpt4 key购买 nike

这篇文章对此进行了介绍:Python web scraping involving HTML tags with attributes

但是我无法为这个网页做类似的事情:http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland

我正在尝试抓取以下值:

  <td class="price city-2">
NZ$15.62
<span style="white-space:nowrap;">(AU$12.10)</span>
</td>
<td class="price city-1">
AU$15.82
</td>

基本上是 city-2 和 city-1 的价格(NZ$15.62 和 AU$15.82)

目前有:

import urllib2

from BeautifulSoup import BeautifulSoup

url = "http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland?"
page = urllib2.urlopen(url)

soup = BeautifulSoup(page)

price2 = soup.findAll('td', attrs = {'class':'price city-2'})
price1 = soup.findAll('td', attrs = {'class':'price city-1'})

for price in price2:
print price

for price in price1:
print price

理想情况下,我还希望使用逗号分隔值:

<th colspan="3" class="clickable">Food</th>, 

提取“食物”,

<td class="item-name">Daily menu in the business district</td>

提取'商圈每日菜单'

然后是 price city-2 和 price-city1 的值

所以打印输出将是:

商业区的美食,每日菜单,NZ$15.62,AU$15.82

谢谢!

最佳答案

我发现 BeautifulSoup 不好用。这是一个基于 webscraping module 的版本:

from webscraping import common, download, xpath

# download html
D = download.Download()
html = D.get('http://www.expatistan.com/cost-of-living/comparison/melbourne/auckland')

# extract data
items = xpath.search(html, '//td[@class="item-name"]')
city1_prices = xpath.search(html, '//td[@class="price city-1"]')
city2_prices = xpath.search(html, '//td[@class="price city-2"]')

# display and format
for item, city1_price, city2_price in zip(items, city1_prices, city2_prices):
print item.strip(), city1_price.strip(), common.remove_tags(city2_price, False).strip()

输出:

Daily menu in the business district AU$15.82 NZ$15.62

Combo meal in fast food restaurant (Big Mac Meal or similar) AU$7.40 NZ$8.16

1/2 Kg (1 lb.) of chicken breast AU$6.07 NZ$10.25

...

关于Python 网页抓取;美汤,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7884567/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com