gpt4 book ai didi

python - 无法使用请求从 zillow 中抓取自定义属性链接

转载 作者:行者123 更新时间:2023-12-02 18:48:19 26 4
gpt4 key购买 nike

我正在尝试解析当我从 zillow 中选择两个下拉列表时填充的不同属性链接。选择完选项后,我可以在开发工具中看到 json 格式的结果。但是,当我使用下面的脚本执行相同操作时,我得到一些奇怪的文本。

手动操作:

  1. 导航至that site
  2. first dropdown 中选择选项
  3. second dropdown 中选择选项

这就是我尝试自动化的方式:

import json
import requests
from pprint import pprint

link = 'https://www.zillow.com/search/GetSearchPageState.htm?'

params = {
'searchQueryState': {"pagination":{},"usersSearchTerm":"Vista, CA","mapBounds":{"west":-117.44051346728516,"east":-116.99488053271484,"south":33.126944633035116,"north":33.27919773006566},"regionSelection":[{"regionId":41517,"regionType":6}],"isMapVisible":True,"filterState":{"doz":{"value":"6m"},"isForSaleByAgent":{"value":False},"isForSaleByOwner":{"value":False},"isNewConstruction":{"value":False},"isForSaleForeclosure":{"value":False},"isComingSoon":{"value":False},"isAuction":{"value":False},"isPreMarketForeclosure":{"value":False},"isPreMarketPreForeclosure":{"value":False},"isRecentlySold":{"value":True},"isAllHomes":{"value":True},"hasPool":{"value":True},"hasAirConditioning":{"value":True},"isApartmentOrCondo":{"value":False}},"isListVisible":True,"mapZoom":11},
'wants': {"cat1":["listResults","mapResults"]},
'requestId': 2
}

with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
res = s.get(link,params=json.dumps(params))
pprint(res.content)

这是它产生的输出:

b'<!-- This page outputs JSON instead of anything written here. -->'

How can I parse customized property links from zillow using requests?

最佳答案

您必须对出现在请求 URL 中的查询字符串进行编码。

为此,您需要:

urllib.parse.urlencode()

这是一个工作示例:

import json
import urllib.parse

import requests

link = 'https://www.zillow.com/search/GetSearchPageState.htm?'

params = {
'searchQueryState': {
"pagination": {},
"usersSearchTerm": "Vista, CA",
"mapBounds": {
"west": -117.44051346728516,
"east": -116.99488053271484,
"south": 33.126944633035116,
"north": 33.27919773006566
},
"regionSelection": [{"regionId": 41517, "regionType": 6}],
"isMapVisible": True,
"filterState": {
"doz": {"value": "6m"}, "isForSaleByAgent": {"value": False},
"isForSaleByOwner": {"value": False}, "isNewConstruction": {"value": False},
"isForSaleForeclosure": {"value": False}, "isComingSoon": {"value": False},
"isAuction": {"value": False}, "isPreMarketForeclosure": {"value": False},
"isPreMarketPreForeclosure": {"value": False},
"isRecentlySold": {"value": True}, "isAllHomes": {"value": True},
"hasPool": {"value": True}, "hasAirConditioning": {"value": True},
"isApartmentOrCondo": {"value": False}
},
"isListVisible": True,
"mapZoom": 11
},
'wants': {"cat1": ["listResults"]},
'requestId': 2
}

with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
s.headers["x-requested-session"] = "BE6D8DA620E60010D84B55EB18DC9DC8"
s.headers["cookie"] = f"JSESSIONID={s.headers['x-requested-session']}"
data = json.dumps(
json.loads(s.get(f"{link}{urllib.parse.urlencode(params)}").content),
indent=2
)
print(data)

输出:

{
"user": {
"isLoggedIn": false,
"hasHousingConnectorPermission": false,
"savedSearchCount": 0,
"savedHomesCount": 0,
"personalizedSearchGaDataTag": null,
"personalizedSearchTraceID": "607a9ecb5aabe489c361c1d91f368b37",
"searchPageRenderedCount": 0,
"guid": "33b7add3-bfd3-4d85-a88a-d9d99256d2a2",
"zuid": "",
"isBot": false,
"userSpecializedSEORegion": false
},
"mapState": {
"customRegionPolygonWkt": null,
"schoolPolygonWkt": null,
"isCurrentLocationSearch": false,
"userPosition": {
"lat": null,
"lon": null
},
"regionBounds": {
"north": 33.275284,
"east": -117.145153,
"south": 33.130865,
"west": -117.290241
}
},

and much much more ...

注意:在该网站上要小心,因为他们有非常敏感的反机器人措施,如果您继续过快地请求数据,他们会向您抛出验证码。

关于python - 无法使用请求从 zillow 中抓取自定义属性链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67130173/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com