gpt4 book ai didi

r - 如何使用循环为 R 中的多个网页抓取网站数据?

转载 作者:行者123 更新时间:2023-12-04 09:48:20 24 4
gpt4 key购买 nike

我想应用一个循环来从 R 中的多个网页抓取数据。我能够抓取一个网页的数据,但是当我尝试对多个页面使用循环时,我得到了一个令人沮丧的错误。我花了几个小时修修补补,但无济于事。任何帮助将不胜感激!!!

这有效:

###########################
# GET COUNTRY DATA
###########################

library("rvest")

site <- paste("http://www.countryreports.org/country/","Norway",".htm", sep="")
site <- html(site)

stats<-
data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() ,
facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() ,
stringsAsFactors=FALSE)

stats$country <- "Norway"
stats$names <- gsub('[\r\n\t]', '', stats$names)
stats$facts <- gsub('[\r\n\t]', '', stats$facts)
View(stats)

但是,当我尝试在循环中编写它时,我收到一个错误
###########################
# ATTEMPT IN A LOOP
###########################

country<-c("Norway","Sweden","Finland","France","Greece","Italy","Spain")

for(i in country){

site <- paste("http://www.countryreports.org/country/",country,".htm", sep="")
site <- html(site)

stats<-
data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() ,
facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() ,
stringsAsFactors=FALSE)

stats$country <- country
stats$names <- gsub('[\r\n\t]', '', stats$names)
stats$facts <- gsub('[\r\n\t]', '', stats$facts)

stats<-rbind(stats,stats)
stats<-stats[!duplicated(stats),]
}

错误:
Error: length(url) == 1 is not TRUE
In addition: Warning message:
In if (grepl("^http", x)) { :
the condition has length > 1 and only the first element will be used

最佳答案

最终工作代码:

###########################
# THIS WORKS!!!!
###########################

country<-c("Norway","Sweden","Finland","France","Greece","Italy","Spain")

for(i in country){

site <- paste("http://www.countryreports.org/country/",i,".htm", sep="")
site <- html(site)

stats<-
data.frame(names =site %>% html_nodes(xpath="//*/td[1]") %>% html_text() ,
facts =site %>% html_nodes(xpath="//*/td[2]") %>% html_text() ,
stringsAsFactors=FALSE)

stats$nm <- i
stats$names <- gsub('[\r\n\t]', '', stats$names)
stats$facts <- gsub('[\r\n\t]', '', stats$facts)
#stats<-stats[!duplicated(stats),]
all<-rbind(all,stats)

}
View(all)

关于r - 如何使用循环为 R 中的多个网页抓取网站数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27832757/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com