gpt4 book ai didi

r - 如何使用 RCurl *after* 服务器身份验证下载大型二进制文件

转载 作者:行者123 更新时间:2023-12-04 04:30:56 26 4
gpt4 key购买 nike

我最初询问 this question 关于使用 httr 包执行此任务,但我认为使用 httr 是不可能的。所以我重新编写了我的代码以使用 RCurl 代替 - 但我仍然被一些可能与 writefunction 相关的东西绊倒了..但我真的不明白为什么。

您应该能够使用 32 位版本的 R 重现我的工作,因此如果您将任何内容读入 RAM,就会达到内存限制。我需要一个直接下载到硬盘的解决方案。

首先,这段代码可以工作——压缩文件被适本地保存到磁盘。

library(RCurl)
filename <- tempfile()
f <- CFILE(filename, "wb")
url <- "http://www2.census.gov/acs2011_5yr/pums/csv_pus.zip"
curlPerform(url = url, writedata = f@ref)
close(f)
# 2.1 GB file successfully written to disk

现在这里有一些不起作用的 RCurl 代码。如 the previous question 中所述,准确地重现这一点将需要在 ipums 上创建一个提取物。
your.email <- "email@address.com"
your.password <- "password"
extract.path <- "https://usa.ipums.org/usa-action/downloads/extract_files/some_file.csv.gz"

library(RCurl)

values <-
list(
"login[email]" = your.email ,
"login[password]" = your.password ,
"login[is_for_login]" = 1
)

curl = getCurlHandle()

curlSetOpt(
cookiejar = 'cookies.txt',
followlocation = TRUE,
autoreferer = TRUE,
ssl.verifypeer = FALSE,
curl = curl
)

params <-
list(
"login[email]" = your.email ,
"login[password]" = your.password ,
"login[is_for_login]" = 1
)

html <- postForm("https://usa.ipums.org/usa-action/users/validate_login", .params = params, curl = curl)
dl <- getURL( "https://usa.ipums.org/usa-action/extract_requests/download" , curl = curl)

现在我已经登录了,尝试与上面相同的命令,但使用 curl 对象来保留 cookie。
filename <- tempfile()
f <- CFILE(filename, mode = "wb")

这条线断了——
curlPerform(url = extract.path, writedata = f@ref, curl = curl)
close(f)

# the error is:
Error in curlPerform(url = extract.path, writedata = f@ref, curl = curl) :
embedded nul in string: [[binary jibberish here]]

我上一篇文章的答案让我引用了 this c-level writefunction 答案,但我对如何重新创建 curl_writer C 程序(在 Windows 上?)一无所知。
dyn.load("curl_writer.so")
writer <- getNativeSymbolInfo("writer", PACKAGE="curl_writer")$address
curlPerform(URL=url, writefunction=writer)

..或者为什么它甚至是必要的,因为这个问题顶部的五行代码没有任何像 getNativeSymbolInfo 这样疯狂的东西。我只是不明白为什么传入存储身份验证/cookies 的额外 curl 对象并告诉它不要验证 SSL 会导致代码在其他情况下工作......破坏?

最佳答案

  • this link 创建一个名为 curl_writer.c 的文件并将其保存到 C:\<folder where you save your R files>
    #include <stdio.h>

    /**
    * Original code just sent some message to stderr
    */
    size_t writer(void *buffer, size_t size, size_t nmemb, void *stream) {
    fwrite(buffer,size,nmemb,(FILE *)stream);
    return size * nmemb;
    }
  • 打开命令窗口,转到保存 curl_writer.c 的文件夹并运行 R 编译器
    c:> cd "C:\<folder where you save your R files>"
    c:> R CMD SHLIB -o curl_writer.dll curl_writer.c
  • 打开 R 并运行您的脚本
    C:> R

    your.email <- "email@address.com"
    your.password <- "password"
    extract.path <- "https://usa.ipums.org/usa-action/downloads/extract_files/some_file.csv.gz"

    library(RCurl)

    values <-
    list(
    "login[email]" = your.email ,
    "login[password]" = your.password ,
    "login[is_for_login]" = 1
    )

    curl = getCurlHandle()

    curlSetOpt(
    cookiejar = 'cookies.txt',
    followlocation = TRUE,
    autoreferer = TRUE,
    ssl.verifypeer = FALSE,
    curl = curl
    )

    params <-
    list(
    "login[email]" = your.email ,
    "login[password]" = your.password ,
    "login[is_for_login]" = 1
    )

    html <- postForm("https://usa.ipums.org/usa-action/users/validate_login", .params = params, curl = curl)
    dl <- getURL( "https://usa.ipums.org/usa-action/extract_requests/download" , curl = curl)

    # Load the DLL you created
    # "writer" is the name of the function
    # "curl_writer" is the name of the dll
    dyn.load("curl_writer.dll")
    writer <- getNativeSymbolInfo("writer", PACKAGE="curl_writer")$address

    # Note that "URL" parameter is upper case, in your code it is lowercase
    # I'm not sure if that has something to do
    # "writer" is the symbol defined above
    f <- CFILE(filename <- tempfile(), "wb")
    curlPerform(URL=url, writedata=f@ref, writefunction=writer, curl=curl)
    close(f)
  • 关于r - 如何使用 RCurl *after* 服务器身份验证下载大型二进制文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17329288/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com