gpt4 book ai didi

r - 使用R自动下载数据

转载 作者:行者123 更新时间:2023-12-04 09:32:51 24 4
gpt4 key购买 nike

我想以 pdf 或 excel 格式下载所有数据
每个州 X 裁剪年度 X 标准报告组合来自 this website .

我按照本教程做我想做的事。
Download data from URL

但是,我在第二行遇到了错误。

driver <- rsDriver()

Error in subprocess::spawn_process(tfile, ...) :
group termination: could not assign process to a job: Access is denied

有没有其他方法可以用来下载这些数据?

最佳答案

首先,检查网站上的robots.txt是否有。然后阅读条款和条件(如果有)。限制下面的请求总是很重要的。

检查所有条款和条件后,下面的代码应该可以帮助您入门:

library(httr)
library(xml2)

link <- "https://aps.dac.gov.in/LUS/Public/Reports.aspx"
r <- GET(link)
doc <- read_html(content(r, "text"))
#write_html(doc, "temp.html")

states <- sapply(xml_find_all(doc, ".//select[@name='DdlState']/option"), function(x)
setNames(xml_attr(x, "value"), xml_text(x)))
states <- states[!grepl("^Select", names(states))]

years <- sapply(xml_find_all(doc, ".//select[@name='DdlYear']/option"), function(x)
setNames(xml_attr(x, "value"), xml_text(x)))
years <- years[!grepl("^Select", names(years))]

rptfmt <- sapply(xml_find_all(doc, ".//select[@name='DdlFormat']/option"), function(x)
setNames(xml_attr(x, "value"), xml_text(x)))

stdrpts <- unlist(lapply(xml_find_all(doc, ".//td/a"), function(x) {
id <- xml_attr(x, "id")
if (grepl("^TreeView1t", id)) return(setNames(id, xml_text(x)))
}))

get_vs <- function(doc) sapply(xml_find_all(doc, ".//input[@type='hidden']"), function(x)
setNames(xml_attr(x, "value"), xml_attr(x, "name")))

fmt <- rptfmt[2] #Excel format
for (sn in names(states)) {
for (yn in names(years)) {
for (srn in seq_along(stdrpts)) {
s <- states[sn]
y <- years[yn]
sr <- stdrpts[srn]

r <- POST(link,
body=as.list(c("__EVENTTARGET"="DdlState",
"__EVENTARGUMENT"="",
"__LASTFOCUS"="",
"TreeView1_ExpandState"="ennnn",
"TreeView1_SelectedNode"="",
"TreeView1_PopulateLog"="",
get_vs(doc),
DdlState=unname(s),
DdlYear=0,
DdlFormat=1)),
encode="form")
doc <- read_html(content(r, "text"))

treeview <- c("__EVENTTARGET"="TreeView1",
"__EVENTARGUMENT"=paste0("sStandard Reports\\", srn),
"__LASTFOCUS"="",
"TreeView1_ExpandState"="ennnn",
"TreeView1_SelectedNode"=unname(stdrpts[srn]),
"TreeView1_PopulateLog"="")
vs <- get_vs(doc)
ddl <- c(DdlState=unname(s), DdlYear=unname(y), DdlFormat=unname(fmt))
r <- POST(link, body=as.list(c(treeview, vs, ddl)), encode="form")
if (r$headers$`content-type`=="application/vnd.ms-excel")
writeBin(content(r, "raw"), paste0(sn, "_", yn, "_", names(stdrpts)[srn], ".xls"))

Sys.sleep(5)
}
}
}

关于r - 使用R自动下载数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58805740/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com