gpt4 book ai didi

r - 无法使用 rvest 抓取带有表单的网站

转载 作者:行者123 更新时间:2023-12-05 09:34:47 30 4
gpt4 key购买 nike

我正在尝试抓取下面列出的以下网站。我尝试通过使用 rvest 和下面的代码来做到这一点。

我的尝试是尝试复制我在 Google Chrome 中为下载按钮找到的 PUT。我不确定我做错了什么。我的 reprex 中列出了错误。

  library(httr)
library(rvest)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union



url <- "https://nfc.shgn.com/adp/baseball"
pgsession <- session(url)

pgform <- html_form(pgsession)[[2]]

filled_form <- html_form_set(pgform,
team_id = "0", from_date = "2020-10-01", to_date = "2021-02-19", num_teams = "0",
draft_type = "0", sport = "baseball", position = "",
league_teams = "0" )
#> Warning: Setting value of hidden field 'team_id'.
#> Warning: Setting value of hidden field 'from_date'.
#> Warning: Setting value of hidden field 'to_date'.
#> Warning: Setting value of hidden field 'num_teams'.
#> Warning: Setting value of hidden field 'draft_type'.
#> Warning: Setting value of hidden field 'sport'.
#> Warning: Setting value of hidden field 'position'.
#> Warning: Setting value of hidden field 'league_teams'.

session_submit(x = pgsession, form = filled_form)
#> Error: `form` doesn't contain a `action` attribute

最佳答案

如果您只想抓取该表,您可以使用 rvestpurrr 通过使用“打印”按钮将您带到的 URL 轻松完成。

虽然您不能使用 html_table,但使用 purrr::map_df 可以直接将单元格提取为数据框:

library(rvest)
library(dplyr)
library(purrr)
library(stringr)

pgtab <- read_html("https://nfc.shgn.com/adp.data.php") %>% #destination of Print button
html_nodes("tr") %>% #returns a list of row nodes
map_df(~html_nodes(., "td") %>% #returns a list of cell nodes for each row
html_text() %>% #extract text
str_trim() %>% #remove whitespace
set_names("Rank","Player","Team","Position","ADP","MinPick",
"MaxPick","Diff","Picks","Team2","PickBid"))

head(pgtab)

# A tibble: 6 x 11
Rank Player Team Position ADP MinPick MaxPick Diff Picks Team2 PickBid
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Ronald Acuna Jr. ATL OF 1.69 1 6 "" 332 "" ""
2 2 Fernando Tatis Jr. SD SS 2.57 1 7 "" 332 "" ""
3 3 Mookie Betts LAD OF 3.53 1 9 "" 332 "" ""
4 4 Juan Soto WAS OF 3.98 1 10 "" 332 "" ""
5 5 Mike Trout LAA OF 6.08 1 11 "" 332 "" ""
6 6 Gerrit Cole NYY P 6.50 1 15 "" 332 "" ""

您还可以设置表单参数并执行此操作,但您必须检查它是否有所不同。这是一种方法...

url <- "https://nfc.shgn.com/adp/baseball"
pgsession <- html_session(url)

pgform <- html_form(pgsession)[[2]]

filled_form <-set_values(pgform,
team_id = "0", from_date = "2020-10-01", to_date = "2021-02-19", num_teams = "0",
draft_type = "0", sport = "baseball", position = "",
league_teams = "0" )

filled_form$url <- "https://nfc.shgn.com/adp.data.php" #error if this is left blank

pgsession <- submit_form(pgsession, filled_form, submit = "printerFriendly")

pgtab <- pgsession %>% read_html() %>% #code as per previous answer above
html_nodes("tr") %>%
map_df(~html_nodes(., "td") %>%
html_text() %>%
str_trim() %>%
set_names("Rank","Player","Team","Position","ADP","MinPick",
"MaxPick","Diff","Picks","Team2","PickBid"))

关于r - 无法使用 rvest 抓取带有表单的网站,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66283185/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com