gpt4 book ai didi

python - 检测网页是否被更改

转载 作者:太空狗 更新时间:2023-10-29 21:05:56 25 4
gpt4 key购买 nike

在我的 python 应用程序中,我必须阅读许多网页来收集数据。为了减少 http 调用,我只想获取更改的页面。我的问题是我的代码总是告诉我页面已更改(代码 200),但实际上并没有。

这是我的代码:

from models import mytab
import re
import urllib2
from wsgiref.handlers import format_date_time
from datetime import datetime
from time import mktime

def url_change():
urls = mytab.objects.all()
# this is some urls:
# http://www.venere.com/it/pensioni/venezia/pensione-palazzo-guardi/#reviews
# http://www.zoover.it/italia/sardegna/cala-gonone/san-francisco/hotel
# http://www.orbitz.com/hotel/Italy/Venice/Palazzo_Guardi.h161844/#reviews
# http://it.hotels.com/ho292636/casa-del-miele-susegana-italia/
# http://www.expedia.it/Venezia-Hotel-Palazzo-Guardi.h1040663.Hotel-Information#reviews
# ...

for url in urls:
request = urllib2.Request(url.url)
if url.last_date == None:
now = datetime.now()
stamp = mktime(now.timetuple())
url.last_date = format_date_time(stamp)
url.save()

request.add_header("If-Modified-Since", url.last_date)

try:
response = urllib2.urlopen(request) # Make the request
# some actions
now = datetime.now()
stamp = mktime(now.timetuple())
url.last_date = format_date_time(stamp)
url.save()
except urllib2.HTTPError, err:
if err.code == 304:
print "nothing...."
else:
print "Error code:", err.code
pass

我不明白哪里出了问题。谁能帮帮我?

最佳答案

当您发送“If-Modified-Since” header 时,Web 服务器不需要发送 304 header 作为响应。他们可以随意发送 HTTP 200 并再次发送整个页面。

发送“If-Modified-Since”或“If-None-Since”会提醒服务器您希望缓存响应(如果可用)。这就像发送“Accept-Encoding: gzip, deflate” header ——您只是告诉服务器您将接受某些内容,而不是要求它。

关于python - 检测网页是否被更改,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15207145/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com