gpt4 book ai didi

javascript - 使用 R 抓取带有表单和 JS 的网站

转载 作者:行者123 更新时间:2023-11-30 20:33:40 25 4
gpt4 key购买 nike

我正在尝试抓取一个具有表单的网站,该网站会从 JS 生成我想要的信息(我猜)。

这是网站:https://www.distancecalculator.net/ , 它计算城市之间的距离。

例如,我想计算这两个城市之间的距离:

  • Craíbas - AL,巴西
  • Maceió - AL,巴西

看起来,即使我使用 POST 来填写表单,我的抓取工具仍在收集单击“计算”按钮之前的可用数据。我做错了什么?

这是我的代码:

library(httr)
library(rvest)

url <- "https://www.distancecalculator.net/"

fd <- list(
submit = "Calculate Distance",
"originCity" = "Craíbas - AL, Brasil",
"destinationCity" = "Maceió - AL, Brasil"
)

resp<-POST(url, body=fd, encode="form")
conte <- content(resp)
conte

tex <- conte %>% html_nodes(xpath = '//span[@id="driving-distance-km"]/text()') %>% html_text()
tex

最佳答案

我同意 RSeleneium 最适合这个的评论。这是您使用 RSelenium 的期望结果。

library(RSelenium)

url <- "https://www.distancecalculator.net/"

#Start Selenium
rD <- rsDriver(port = 4444L, browser = "chrome")
remDr <- rD$client
remDr$navigate(url)

#Type in the information
originCity <- remDr$findElement(using = "css", "#originCity")
originCity$sendKeysToElement(list("Craíbas - AL, Brasil"))
#Click the first item
clickFirst <- remDr$findElements(using = "css", ".pac-item")
clickFirst2 <- unlist(lapply(clickFirst, function(x){
x$getElementText()
}))
clickFirst2
click <- clickFirst[[which(clickFirst2 == clickFirst2[1])]]
click$clickElement()

destinationCity <- remDr$findElement(using = "css", "#destinationCity")
destinationCity$sendKeysToElement(list("Maceió - AL, Brasil"))
#Click the first item
clickFirst <- remDr$findElements(using = "css", ".pac-item")
clickFirst2 <- unlist(lapply(clickFirst, function(x){
x$getElementText()
}))
clickFirst2
click <- clickFirst[[which(clickFirst2 == clickFirst2[1])]]
click$clickElement()

#No longer Necessary
calculate <- remDr$findElements(using = "xpath", '//*[contains(concat( " ",
@class, " " ), concat( " ", "button", " " ))]')
calculate2 <- unlist(lapply(calculate, function(x){
x$getElementText()
}))
calculate2
click <- calculate[[which(calculate2 == calculate2[1])]]
click$clickElement()

#Scrape the result
dist <- remDr$findElements(using = "css", "#driving-distance-km")
dist <- unlist(lapply(dist, function(x){
x$getElementText()
}))
dist
remDr$close()

以及指向 RSelenium 包信息的链接:https://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-basics.html

关于javascript - 使用 R 抓取带有表单和 JS 的网站,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50063801/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com