gpt4 book ai didi

javascript - 如何使用 R 从 javascript 饼图中抓取网络数据?

转载 作者:行者123 更新时间:2023-11-29 19:08:50 25 4
gpt4 key购买 nike

我有:

library(XML)

my_URL <- "http://www.velocitysharesetns.com/viix"

tables <- readHTMLTable(my_URL)

PieChart

以上仅输出位于页面顶部的表格。看起来饼图被忽略了,javascript 的事实解释了它。是否有任何简单的解决方案来提取图表中的两个 % 数字?

查看了 RSelenium 但是我遇到了一些错误,我无法找到任何解决方案。

> RSelenium::startServer()
Error in if (file.exists(file) == FALSE) if (!missing(asText) && asText == :
argument is of length zero
In addition: Warning messages:
1: startServer is deprecated.
Users in future can find the function in file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity.
Options include manually starting a server see vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see vignette("RSelenium-docker", package = "RSelenium")
2: running command '"java" -jar "\\med-fs01/Home/Alex.Badoi/R/win-library/3.3/RSelenium/bin/selenium-server-standalone.jar" -log "\\med-fs01/Home/Alex.Badoi/R/win-library/3.3/RSelenium/bin/sellog.txt"' had status 127
3: running command '"wmic" path win32_process get Caption,Processid,Commandline /format:htable' had status 44210
>

根据 Phillip 的回答,我想出了流畅的解决方案:

library(XML)



# extarct HTML

doc.html = htmlTreeParse('http://www.velocitysharesetns.com/viix',
useInternal = TRUE)


# convert to text

htmltxt <- paste(capture.output(doc.html, file=NULL), collapse="\n")

# get location of string

pos = regexpr('CBOE SHORT-TERM VIX FUTURE', htmltxt)

# extarct from "pos" to nchar to end of string

keep = substr(htmltxt, pos, pos+98)

输出:

> keep
[1] "CBOE SHORT-TERM VIX FUTURE DEC 2016', 81.64],\n\n ['CBOE SHORT-TERM VIX FUTURE JAN 2017', 18.36],\n"

最佳答案

使用 RSelenium

此解决方案适用于使用 Rselenium 的我(使用 Windows 7 并在检查网页源之后)。请注意,我使用的是 chromedriver.exe

library(RSelenium)
checkForServer(update = TRUE)

#### I use Chromedriver
startServer(args = c("-Dwebdriver.chrome.driver=C:/Stuff/Scripts/chromedriver.exe"))

remDr <- remoteDriver(remoteServerAddr = "localhost", browserName="chrome", port=4444)

### Open Chrome
remDr$open()

remDr$navigate("http://www.velocitysharesetns.com/viix")

b <- remDr$findElements(using="class name", value="jqplot-pie-series")

sapply(b, function(x){x$getElementAttribute("outerHTML")})

最后一条命令返回

[[1]]
[1] "<div class=\"jqplot-pie-series jqplot-data-label\" style=\"position: absolute; left: 100px; top: 106px;\"><div style=\"color:white;font-weight:bold;\">82%</div></div>"

[[2]]
[1] "<div class=\"jqplot-pie-series jqplot-data-label\" style=\"position: absolute; left: 159px; top: 67px;\"><div style=\"color:white;font-weight:bold;\">18%</div></div>"

您可以看到百分比数字出现在那里并且可以轻松提取。

只使用纯 html

此外,数据也可以通过读取 html 源来获取,因为数据已经包含在内。在源代码的某处你会发现:

<script type="text/javascript" language="javascript">
$(document).ready(function(){
var data = [


['CBOE SHORT-TERM VIX FUTURE DEC 2016', 81.64],

['CBOE SHORT-TERM VIX FUTURE JAN 2017', 18.36],

];

这就是您要找的。数字在图中四舍五入。

关于javascript - 如何使用 R 从 javascript 饼图中抓取网络数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40741593/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com