gpt4 book ai didi

r - 从 R 中的雅虎财经中提取历史分析师意见

转载 作者:行者123 更新时间:2023-12-02 01:47:50 25 4
gpt4 key购买 nike

雅虎财经有 data on historic analyst opinions对于股票。我有兴趣将这些数据提取到 R 中进行分析,这是我到目前为止所得到的:

getOpinions <- function(symbol) {
require(XML)
require(xts)
yahoo.URL <- "http://finance.yahoo.com/q/ud?"
tables <- readHTMLTable(paste(yahoo.URL, "s=", symbol, sep = ""), stringsAsFactors=FALSE)
Data <- tables[[11]]
Data$Date <- as.Date(Data$Date,'%d-%b-%y')
Data <- xts(Data[,-1],order.by=Data[,1])
Data
}

getOpinions('AAPL')

我担心如果表的位置(当前为11)发生变化,这段代码会中断,但我想不出一种优雅的方法来检测哪个表具有我想要的数据。我试过the solution posted here ,但似乎对这个问题不起作用。

是否有更好的方法来抓取这些数据,并且如果雅虎重新安排其网站,该数据不太可能被破坏?

编辑:看起来已经有一个包( fImport )可以执行此操作。

library(fImport)
yahooBriefing("AAPL")

这是他们的解决方案,它不会返回 xts 对象,并且如果页面布局发生变化,可能会中断(fImport 中的 yahooKeystats 函数已经中断):

function (query, file = "tempfile", source = NULL, save = FALSE, 
try = TRUE)
{
if (is.null(source))
source = "http://finance.yahoo.com/q/ud?s="
if (try) {
z = try(yahooBriefing(query, file, source, save, try = FALSE))
if (class(z) == "try-error" || class(z) == "Error") {
return("No Internet Access")
}
else {
return(z)
}
}
else {
url = paste(source, query, sep = "")
download.file(url = url, destfile = file)
x = scan(file, what = "", sep = "\n")
x = x[grep("Briefing.com", x)]
x = gsub("</", "<", x, perl = TRUE)
x = gsub("/", " / ", x, perl = TRUE)
x = gsub(" class=.yfnc_tabledata1.", "", x, perl = TRUE)
x = gsub(" align=.center.", "", x, perl = TRUE)
x = gsub(" cell.......=...", "", x, perl = TRUE)
x = gsub(" border=...", "", x, perl = TRUE)
x = gsub(" color=.red.", "", x, perl = TRUE)
x = gsub(" color=.green.", "", x, perl = TRUE)
x = gsub("<.>", "", x, perl = TRUE)
x = gsub("<td>", "@", x, perl = TRUE)
x = gsub("<..>", "", x, perl = TRUE)
x = gsub("<...>", "", x, perl = TRUE)
x = gsub("<....>", "", x, perl = TRUE)
x = gsub("<table>", "", x, perl = TRUE)
x = gsub("<td nowrap", "", x, perl = TRUE)
x = gsub("<td height=....", "", x, perl = TRUE)
x = gsub("&amp;", "&", x, perl = TRUE)
x = unlist(strsplit(x, ">"))
x = x[grep("-...-[90]", x, perl = TRUE)]
nX = length(x)
x[nX] = gsub("@$", "", x[nX], perl = TRUE)
x = unlist(strsplit(x, "@"))
x[x == ""] = "NA"
x = matrix(x, byrow = TRUE, ncol = 9)[, -c(2, 4, 6, 8)]
x[, 1] = as.character(strptime(x[, 1], format = "%d-%b-%y"))
colnames(x) = c("Date", "ResearchFirm", "Action", "From",
"To")
x = x[nrow(x):1, ]
X = as.data.frame(x)
}
X
}

最佳答案

这是一个你可以使用的技巧。在您的函数中,添加以下内容

# GET THE POSITION OF TABLE WITH MAX. ROWS
position = which.max(sapply(tables, NROW))
Data = tables[[position]]

只要页面上最长的表格是您要查找的内容,此方法就有效。

如果你想让它更健壮一点,这里有另一种方法

# GET POSITION OF TABLE CONTAINING RESEARCH FIRM IN ITS NAMES
position = sapply(tables, function(tab) 'Research Firm' %in% names(tab))
Data = tables[position == TRUE]

关于r - 从 R 中的雅虎财经中提取历史分析师意见,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7531238/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com