gpt4 book ai didi

r - 如何在 R 中使用 httr 对 shibboleth 多主机名网站进行身份验证

转载 作者:行者123 更新时间:2023-12-04 01:11:14 25 4
gpt4 key购买 nike

注意:ipums international 和 ipums usa 可能使用相同的系统。 ipums usa 允许更快的注册。如果您想测试您的代码,请尝试 https://usa.ipums.org/usa-action/users/request_access注册!

我正在尝试以编程方式从 https://international.ipums.org/ 下载文件使用 R 语言和 httr。我需要使用 httr 而不是 RCurl,因为我需要在身份验证后下载大文件,而不是下载到 RAM 中,而是直接下载到磁盘。 this is currently only possible with httr as far as i know

下面的可重现代码记录了我从登录页面 ( https://international.ipums.org/international-action/users/login ) 到主认证后页面的最大努力。任何提示或提示将不胜感激!谢谢!

my_email <- "email@address.com"
my_password <- "password"

tf <- tempfile()

# use httr, because i need to download a large file after authentication
# and only httr supports that with its `write_disk()` option
library(httr)

# turn off ssl verify, otherwise the subsequent GET command will fail
set_config( config( ssl_verifypeer = 0L ) )

GET( "https://international.ipums.org/Shibboleth.sso/Login?target=https%3A%2F%2Finternational.ipums.org%2Finternational-action%2Fmenu" )

# connect to the starting login page of the website
( a <- GET( "https://international.ipums.org/international-action/users/login" , verbose( info = TRUE ) ) )

# which takes me through to a lot of websites, but ultimately (in my browser) lands at
shibboleth_url <- "https://live.identity.popdata.org:443/idp/Authn/UserPassword"

# construct authentication information?
base_values <- list( "j_username" = my_email , "j_password" = my_password )
idp_values <- list( "j_username" = my_email , "j_password" = my_password , "_idp_authn_lc_key"=subset( a$cookies , domain == "live.identity.popdata.org" )$value , "JSESSIONID" = subset( a$cookies , domain == "#HttpOnly_live.identity.popdata.org" )$value )
ipums_values <- list( "j_username" = my_email , "j_password" = my_password , "_idp_authn_lc_key"=subset( a$cookies , domain == "live.identity.popdata.org" )$value , "JSESSIONID" = subset( a$cookies , domain == "international.ipums.org" )$value)

# i believe this is where the main login should happen, but it looks like it's failing
GET( shibboleth_url , query = idp_values )
POST( shibboleth_url , body = base_values )
writeBin( GET( shibboleth_url , query = idp_values )$content , tf )

readLines( tf )
# The MPC account authentication system has encountered an error
# This error can sometimes occur if you did not close your browser after logging out of an application previously. It may also occur for other reasons. Please close your browser and try your action again."

writeBin( GET( "https://live.identity.popdata.org/idp/profile/SAML2/Redirect/SSO" , query = idp_values )$content , tf )
POST( "https://live.identity.popdata.org/idp/profile/SAML2/Redirect/SSO" , body = idp_values )
readLines( tf )
# same error as above

# return to the main login page..
writeBin( GET( "https://international.ipums.org/international-action/menu" , query = ipums_values )$content , tf )
readLines( tf )
# ..not logged in

最佳答案

您必须使用 set_cookies()将您的 cookie 发送到服务器:

library(httr)
library(rvest)
#my_email <- "xxx"
#my_password <- "yyy"
tf <- tempfile()
set_config( config( ssl_verifypeer = 0L ) )

# Get first page
p1 <- GET( "https://international.ipums.org/international-action/users/login" , verbose( info = TRUE ) )

# Post Login credentials
b2 <- list( "j_username" = my_email , "j_password" = my_password )
c2 <- c(JSESSIONID=p1$cookies[p1$cookies$domain=="#HttpOnly_live.identity.popdata.org",]$value,
`_idp_authn_lc_key`=p1$cookies[p1$cookies$domain=="live.identity.popdata.org",]$value)
p2 <- POST(p1$url,body = b2, set_cookies(.cookies = c2), encode="form" )

# Parse hidden fields
h2 <- read_html(p2$content)
form <- h2 %>% html_form()

# Post hidden fields
b3 <- list( "RelayState"=form[[1]]$fields[[1]]$value, "SAMLResponse"=form[[1]]$fields[[2]]$value)
c3 <- c(JSESSIONID=p1$cookies[p1$cookies$domain=="#HttpOnly_live.identity.popdata.org",]$value,
`_idp_session`=p2$cookies[p2$cookies$name=="_idp_session",]$value,
`_idp_authn_lc_key`=p2$cookies[p2$cookies$name=="_idp_authn_lc_key",]$value)
p3 <- POST( form[[1]]$url , body=b3, set_cookies(.cookies = c3), encode = "form")

# Get interesting page
c4 <- c(JSESSIONID=p3$cookies[p1$cookies$domain=="international.ipums.org" && p3$cookies$name=="JSESSIONID",]$value,
`_idp_session`=p3$cookies[p3$cookies$name=="_idp_session",]$value,
`_idp_authn_lc_key`=p3$cookies[p3$cookies$name=="_idp_authn_lc_key",]$value)
p4 <- GET( "https://international.ipums.org/international-action/menu", set_cookies(.cookies = c4) )
writeBin(p4$content , tf )
readLines( tf )[55]

由于结果是
[1] "    <li class=\"lastItem\"><a href=\"/international-action/users/logout\">Logout</a></li>"

我想你已经登录了...

关于r - 如何在 R 中使用 httr 对 shibboleth 多主机名网站进行身份验证,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34829920/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com