gpt4 book ai didi

html - 网页抓取 : Select Fields from Drop Downs, 提取结果数据

转载 作者:行者123 更新时间:2023-12-02 04:12:54 25 4
gpt4 key购买 nike

尝试在 R 中进行一些网页抓取,并可能需要一些帮助。

我想提取本页http://droughtmonitor.unl.edu/MapsAndData/DataTables.aspx表中的数据

但我想首先从最左侧的下拉列表中选择县,然后从下一个下拉列表中选择阿拉米达县(加利福尼亚州),然后抓取表。

这是我到目前为止所拥有的,但我想我知道为什么它不起作用 - rvest 表单函数适合填写基本表单,而不是从 .aspx(?) 上的下拉菜单中进行选择。四处寻找我想做的事情的例子,但一无所获。

library(rvest)
url <-"http://droughtmonitor.unl.edu/MapsAndData/DataTables.aspx"
pgsession <-html_session(url)
pgform <-html_form(pgsession)[[1]]

filled_form <- set_values(pgform,
`#atype_chosen span` = "County",
`#asel_chosen span` = "Alameda Count (CA)")
submit_form(pgsession,filled_form)

无论如何,这给了我一个错误“错误:未知的字段名称:#atype_chosen span,#asel_chosen span”。我有点明白了...我要求 R 在框中输入县,而不打开下拉菜单,这是行不通的。

如果有人能指出我正确的方向,我将不胜感激。

最佳答案

我监控了浏览器在选择您所在的县时发出的请求,并使用该信息来创建此请求。它会为您获取数据,只是以与您处理数据的方式不同的方式...有效负载中的区域参数适用于不同的县。

更新:我添加了获取县列表和代码的代码,以便您可以选择要从中获取数据的任何县...

library("httr")

# start by getting the counties and their codes...
url <- "http://droughtmonitor.unl.edu/Ajax.aspx/ReturnAOI"
headers <- add_headers(
"Accept" = "application/json, text/javascript, */*; q=0.01",
"Accept-Encoding" = "gzip, deflate",
"Accept-Language" = "en-US,en;q=0.8",
"Content-Length" = "16",
"Content-Type" = "application/json; charset=UTF-8",
"Host" = "droughtmonitor.unl.edu",
"Origin" = "http://droughtmonitor.unl.edu",
"Proxy-Connection" = "keep-alive",
"Referer" = "http://droughtmonitor.unl.edu/MapsAndData/DataTables.aspx",
"User-Agent" = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36",
"X-Requested-With" = "XMLHttpRequest"
)
a <- POST(url, body="{'aoi':'county'}", headers, encode="json")
tmp <- content(a)[[1]]
county_df <- data.frame(text=unname(unlist(sapply(tmp, "[", "Text"))),
value=unname(unlist(sapply(tmp, "[", "Value"))),
stringsAsFactors=FALSE)

# use the code for whatever county you want in the payload below...

url <- "http://droughtmonitor.unl.edu/Ajax.aspx/ReturnTabularDM"
payload <- "{'area':'06001', 'type':'county', 'statstype':'1'}"
headers <- add_headers(
"Host" = "droughtmonitor.unl.edu",
"Proxy-Connection" = "keep-alive",
"Content-Length" = "50",
"Accept" = "application/json, text/javascript, */*; q=0.01",
"Origin" = "http://droughtmonitor.unl.edu",
"X-Requested-With" = "XMLHttpRequest",
"User-Agent" = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36",
"Content-Type" = "application/json; charset=UTF-8",
"Referer" = "http://droughtmonitor.unl.edu/MapsAndData/DataTables.aspx",
"Accept-Encoding" = "gzip, deflate",
"Accept-Language" = "en-US,en;q=0.8",
"X-Requested-With" = "XMLHttpRequest"
)
a <- POST(url, body=payload, headers, encode="json")
tmp <- content(a)[[1]]
df <- data.frame(date=unname(unlist(sapply(tmp, "[", "Date"))),
d0=unname(unlist(sapply(tmp, "[", "D0"))),
d1=unname(unlist(sapply(tmp, "[", "D1"))),
d2=unname(unlist(sapply(tmp, "[", "D2"))),
d3=unname(unlist(sapply(tmp, "[", "D3"))),
d4=unname(unlist(sapply(tmp, "[", "D4"))),
stringsAsFactors=FALSE)

关于html - 网页抓取 : Select Fields from Drop Downs, 提取结果数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35633533/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com