gpt4 book ai didi

python - 网络抓取 python 中的 craigslist 公寓价格未显示最高成本公寓

转载 作者:太空宇宙 更新时间:2023-11-03 16:42:26 25 4
gpt4 key购买 nike

它显示公寓的最高价格为 4700 美元,而我看到的最高价格超过 100 万美元。为什么它没有显示这一点?我做错了什么?

import requests
import re


r = requests.get("http://orlando.craigslist.org/search/apa")
r.raise_for_status()

html = r.text


matches = re.findall(r'<span class="price">\$(\d+)</span>', html)
prices = map(int, matches)


print "Highest price: ${}".format(max(prices))
print "Lowest price: ${}".format(min(prices))
print "Average price: ${}".format(sum(prices)/len(prices))

最佳答案

使用 html 解析器 bs4非常容易使用,您可以通过在网址中添加 ?sort=pricedsc 按价格排序,这样第一个匹配项将是最大值,最后一个匹配项将是最后一个最低项(对于该页面):

r = requests.get("http://orlando.craigslist.org/search/apa?sort=pricedsc")
from bs4 import BeautifulSoup

html = r.content

soup = BeautifulSoup(html)
print "Highest price: ${}".format(prices[0])
print "Lowest price: ${}".format(prices[-1])
print "Average price: ${}".format(sum(prices, 0.0)/len(prices))

如果您想要最低价格,则需要按升序排列:

r = requests.get("http://orlando.craigslist.org/search/apa?sort=priceasc")
from bs4 import BeautifulSoup

html = r.content

soup = BeautifulSoup(html)
prices = [int(pr.text.strip("$")) for pr in soup.select("span.price")]
print "Highest price: ${}".format(prices[-1])
print "Lowest price: ${}".format(prices[0])
print "Average price: ${}".format(sum(prices, 0.0)/len(prices))

现在输出非常不同:

Highest price: $70
Lowest price: $1
Average price: $34.89

如果你想要所有的平均值,你需要添加更多的逻辑。默认情况下,您只能看到 2500 个结果中的 100 个 结果,但我们可以更改这一点。

r = requests.get("http://orlando.craigslist.org/search/apa")
from bs4 import BeautifulSoup

html = r.content

soup = BeautifulSoup(html)
prices = [int(pr.text.strip("$")) for pr in soup.select("span.price")]

# link to next 100 results
nxt = soup.select_one("a.button.next")["href"]

# keep looping until we find a page with no next button
while nxt:
url = "http://orlando.craigslist.org{}".format(nxt)
r = requests.get(url)
soup = BeautifulSoup(r.content)
# extend prices to our list
prices.extend([int(pr.text.strip("$")) for pr in soup.select("span.price")])
nxt = soup.select_one("a.button.next")
if nxt:
nxt = nxt["href"]

这将为您提供 1-2500 的每个列表

关于python - 网络抓取 python 中的 craigslist 公寓价格未显示最高成本公寓,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36679285/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com