r - 来自 B3/BM&F Bovespa 的网页抓取-6ren

r - 来自 B3/BM&F Bovespa 的网页抓取

转载作者：行者123 更新时间：2023-12-04 08:47:11

26

4

我正在尝试从 BM&FBOVESPA 引用汇率页面下载一些数据。
他们的网页是...
http://www.b3.com.br/en_us/market-data-and-indices/data-services/market-data/reports/derivatives-market/reference-prices/bm-fbovespa-reference-rates/
框架是...
http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-ptBR.asp
这是我的代码，它给了我一个错误: out[j + k, ] 中的错误:下标越界

#URL which contains the data 
url <- 'http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-ptBR.asp'

#Read the HTML from the URL
site <- read_html(url)

#Save the table as "list"
lista_tabela <- site %>%
  html_nodes("table") %>%
  html_table(fill = TRUE) 

#"list" to df
CurvaDI <- lista_tabela[[1]]

我无法更正此错误，只能从他们的网站下载表格并将其另存为 df。
另外，我试图在一个代码中下载几个时期。如果有人能帮忙，很高兴!
非常感谢!

最佳答案

原始源代码中的 html 似乎是故意格式错误的，因此您必须在解析表之前对其进行重组。以下使用一系列正则表达式来获取可解析表:

library(rvest)
library(httr)
library(stringr)

url <- 'http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-enUS.asp'

html <- content(GET(url), as = "raw") %>% rawToChar(.)
html <- str_replace_all(html, '(</tr>\r\n\r\n</tr>)', '</tr>\r\n\r\n<tr>')
html <- str_replace_all(html[[1]], '(<thead>|</thead>)', '')
html <- str_replace_all(html[[1]], '(</tr>\r\n\r\n<th)', '</tr><tr>')

data <- html[[1]] %>% read_html() %>% html_table(fill = TRUE) 

dataframe <- tail(data[[1]], -1)

print(dataframe)

这使 :

    Calendar Days ID x fixed rate ID x fixed rate
2               1            1.90            0.00
3               7            1.90            1.55
4               8            1.90            1.70
5               9            1.90            1.81
6              13            1.91            1.67
7              14            1.91            1.75
8              21            1.91            1.81
9              23            1.91            1.89
10             24            1.91            1.93
11             28            1.91            1.75
12             30            1.91            1.82
13             34            1.92            1.77
14             41            1.93            1.82
15             43            1.94            1.87
16             52            1.95            1.93
.................................................

要提交表单数据，您可以使用特定选项和日期格式构建 POST 请求。以下将获得选项并提示用户选择一个然后获取数据:

library(rvest)
library(httr)
library(stringr)

date <- as.Date("2020-10-07")

url <- 'http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-enUS.asp'

html <- content(GET(url), as = "raw") %>% rawToChar(.)

getData <- function(html){
    html <- str_replace_all(html, '(</tr>\r\n\r\n</tr>)', '</tr>\r\n\r\n<tr>')
    html <- str_replace_all(html[[1]], '(<thead>|</thead>)', '')
    html <- str_replace_all(html[[1]], '(</tr>\r\n\r\n<th)', '</tr><tr>')

    body <- html[[1]] %>% read_html()
    table <- body %>% html_table(fill = TRUE) 

    if (length(table) > 0){
        dataframe <- tail(table[[1]], -1)
        return(list(data = dataframe, body = body))
    }
    return(list(data = NULL, body = body))
}

res <- getData(html)
print(res[[1]])

options <- res[[2]] %>% html_nodes("option")
i <- 1
optionList = list()
for(o in options){
    optionList[[i]] <- c(
        key = o %>% html_attr("value"), 
        value = str_replace_all(o %>% html_text(),'\r\n','')
    )
    print(paste("[",i,"] ", optionList[[i]]["value"], sep=""))
    i <- i + 1
}
cat("Choose option by index : ")
selected <- readLines("stdin",n=1);
selectedOption <- optionList[[as.integer(selected)]]
print(paste("you selected :", selectedOption["value"], sep=" "))

postUrl <- modify_url(url, 
    query = list(
        Data = format(date, format="%m/%d/%Y"), 
        Data1 = format(date, format="%Y%m%d"), 
        slcTaxa = selectedOption["key"]
    )
)
html <- content(POST(postUrl, body = list(
    Data = format(date, format="%m/%d/%Y"), 
    Data1 = format(date, format="%Y%m%d"), 
    slcTaxa = selectedOption["key"],
    nomexls = "",
    lQtdTabelas = "",
    IDIOM =  2
), encode = "form"), as = "raw") %>% rawToChar(.)

res <- getData(html)
print(res[[1]])

关于r - 来自 B3/BM&F Bovespa 的网页抓取，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64252559/

26

4

0

文章推荐：用于匹配没有尾部斜杠的 URL 的正则表达式也没有文件扩展名

文章推荐： functional-programming - (Racket) 返回满足条件的列表的子列表

文章推荐： javascript - 按特定键的值在多个数组中查找对象

python - BM-25搜索算法在python中的实现
我正在用 python 设计一个基于 okapi bm25 的搜索引擎，我应该使用什么来存储页面数据(相关性等)和 tf、idf 以便快速访问？ python shelf 是否足以用于可扩展的设计？
java - Android bm.copy NullPointerException
我在以下代码中遇到 NullPointerException 问题: if(bm != null) { mTempBitmap = bm.copy(bm.getConfig(), true);
android - 错误处理购买 [BM-CPH-08]
我已经在我的应用程序中为应用内计费 api 完成了正确的设置，但是当我要购买时，我收到“错误处理购买 [BM-CPH-08]” 所以请帮助我，如何解决这个问题。谢谢大家.. 最佳答案我也遇到了同
ios - autoresize = RM+BM 是什么意思？
我在使用带有此行的 Xcode 调试控制台时得到了这个: po [[UIWindow keyWindow] recursiveDescription] 然后我得到像这样的递归描述的行 > “autor
android - 处理购买时出错 (BM-CPH-08) - Android
我的一些客户在通过 Google Play 购买时遇到了这个问题。他们在尝试购买时收到错误消息 “Error processing purchase (BM-CPH-08)”。此问题仅对部分用户造成
algorithm - 你用过 KMP 或 BM 算法吗？
我知道 KMP (Knuth–Morris–Pratt) 和 BM (Boyers-Moore) 算法都是很好的字符串搜索运算算法。我也知道 BM 比 KMP 快 3-5 倍。在您的行业软件编程经验
c++ - 是否可以通过编程方式更改 USB<-> 串行转换器的 'BM' 延迟选项？
有没有办法以编程方式更改 USB 串行适配器的“BM 选项延迟计时器”？需要在嵌入式 Windows XP 上工作。可以是 .net 2.0 或 native Windows 解决方案... 最佳答案
vba - 尝试获取单元格 AI 和 BM 的值时出现错误 1004
我正在尝试查找是否在特定单元格中填充了一些强制性值。但是当我试图找到单元格 AI 和 BM 的值时，excel 会抛出错误 1004，但它适用于单元格 X 和 Y。知道如何纠正这个问题吗？第 1 行
r - 来自 B3/BM&F Bovespa 的网页抓取
我正在尝试从 BM&FBOVESPA 引用汇率页面下载一些数据。他们的网页是... http://www.b3.com.br/en_us/market-data-and-indices/data-s
python - 如何要求 sympy 不要将 bm 翻译成 boldsymbol？
我处于最终用户可以自己定义变量名的情况。例如:名为“tbm_al”的变量是正确的。为了将变量 pprint 为 latex ，我正在使用 sympy.latex 并期望有类似“tbm”和“al”作
string - 为什么在文本编辑器的查找功能中选择 "BM algorithm"而不是 "Sunday algorithm"？
在随机字符下，Sunday算法比bm算法更快。那么，为什么在文本编辑器的查找功能中选择“BM算法”而不是“星期日算法”呢？最佳答案对于为什么要选择一个而不是另一个，没有简单的答案。 “BM”是指
r - 如何让\bm{} 在 R markdown(到 HTML)文件中工作？
我的 R Markdown (.Rmd) 文件如下所示: --- title: Foo author: Marius Hofert header-includes: - \usepackage
linux - #!/bm/bash 和 #!/bin/sh 之间的 shell 编程区别
请有人告诉我 #!/bm/bash 和 #!/bin/sh 之间有什么区别以及链接以获得更好的想法，以及为什么我们必须将其放在脚本的开头？最佳答案 bash和 sh是两个不同的 shell 。基本上
android - Google Play Alpha 应用程序 BM-PPH-01 错误
我正在制作一个应用程序，并已通过我的控制台上传到 Google Play，并希望有人对其进行测试。我创建了一个 Google 群组并为自己添加了一个不同的帐户，但收到以下错误: BM-PPH-01
python-3.x - 什么是 OpenCV 中实现的立体 BM 和 SGBM 算法中的散斑
在应用 OpenCV 中实现的立体 BM 和 SGBM 算法时，我遇到了“散斑噪声”的概念，它由散斑滤波器过滤，由其“speckleWindowSize”和“speckleRange 参数”表征 =>
ios - swift : Error when the same button is reused. 错误是 "
(Swift 2，XCode 7.0.1)我正在重用基本计算器应用程序的按钮，并将所有按钮连接(控制拖动)到同一个 IBAction，即 @IBAction 函数数字(发送者:UIButton){ 打

首页

博学

6Ren·AI

商城

r - 来自 B3/BM&F Bovespa 的网页抓取