gpt4 book ai didi

r - 使用 R (rvest) 导航和抓取

转载 作者:行者123 更新时间:2023-12-03 23:00:48 25 4
gpt4 key购买 nike

我正在尝试登录 stackoverflow 并在搜索栏上导航,通过 tidyverse 包进行搜索。
主要问题是当我设置 url 时,它没有给我填写我的电子邮件和密码的表单:
所以url<-"https://stackoverflow.com"不起作用。我试过网址:url<-"https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"这是我点击底部的登录时所拥有的网址,但在使用 html_form 时我也找不到填写我的电子邮件和密码的表格。 .这是我的代码:

    library(rvest)

url<-"https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"

(session <- html_session(url))

(form <- html_form(read_html(url))[[1]])

(filled_form <- set_values(form,email="myemail@gmail.com",pass="mypassword"))

(form_submitted<-submit_form(session,filled_form))

(submitted_url<-form_submitted$url)

after_filled_html<-jump_to(session,submitted_url)
在此之后,我想按以下术语进行搜索: [tidyverse]并开始刮它。
我认为如果我解决了上面代码的问题,如果我修复了登录名/密码/表单问题,我将能够管理第二部分。
任何帮助家伙

最佳答案

您可以直接在网址中设置搜索词,无需登录stackoverflow :

library(rvest)

getStackQuestions <- function(search) {
stackoverflow <- read_html(paste0('https://stackoverflow.com/questions/tagged/',search,'?tab=Newest'))
questions <- stackoverflow %>% html_nodes(".question-hyperlink:not(.mb0)")
question.href <- questions %>% html_attr('href')
question.text <- questions %>% html_text()
questions <- data.frame( text = question.text, href = paste0("https://stackoverflow.com",question.href))
questions
}

tidyverse_questions <- getStackQuestions('tidyverse')

head(tidyverse_questions$text)
[1] "Python/Pandas equivalent of across and weighted average"
[2] "Transforming columns based off separate dataframe - R solution"
[3] "Group by summarize in between dates with dplyr"
[4] "Transpose complex data.frame with tidyR"
[5] "Create 1 composite variable derived from different combinations of values of 2nd variable that are separated by specific levels of 3rd variable"
[6] "extracting a cv.glmnet object from Tune_results"

关于r - 使用 R (rvest) 导航和抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65864142/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com