gpt4 book ai didi

r - 如何在R中实现支持向量机

转载 作者:行者123 更新时间:2023-11-30 09:01:13 25 4
gpt4 key购买 nike

我是机器学习新手(不是数学家),并且自己通过视频和书籍学习 ML。我对朴素贝叶斯、支持向量机、决策树等算法有基本的了解,并且我正在使用机器学习来模拟股票市场的每日返回。我想为我的机器学习使用非线性回归算法,因此选择了支持向量机回归,因为它很受欢迎。我使用交易日和 EMA 差异作为特征向量 (X),使用价格变化作为标签 (Y)。下面是我的代码

library("quantmod")
#Adding libraries
library("lubridate")
#Makes it easier to work with the dates
library("e1071")
#Gives us access to the svm
stockData <- new.env()
tickers <- 'AAPL'
startDate = as.Date("2015-11-01")
# The beginning of the date range we want to look at


symbol = getSymbols(tickers,from=startDate, auto.assign=F)
# Retrieving Apple’s daily OHLCV from Yahoo Finance
DayofWeek<-wday(symbol, label=TRUE)
#Find the day of the week
Class<- Cl(symbol) - Op(symbol)
#price change
EMA5<-EMA(Cl(symbol),n = 5)
#We are calculating a 5-period EMA off the open price

EMA10<-EMA(Cl(symbol),n = 10)
#Then the 10-period EMA, also off the open price
EMACross <- EMA5 - EMA10
#Positive values correspond to the 5-period EMA being above the 10-period EMA

EMACross<-round(EMACross,2)


DataSet2<-data.frame(DayofWeek,EMACross, Class)
DataSet2<-DataSet2[-c(1:10),]
#We need to remove the instances where the 10-period moving average is still being calculated
m<-nrow(DataSet2)
n<-round((nrow(DataSet2)*2)/3)
TrainingSet<-DataSet2[1:n,]
#We will use ⅔ of the data to train the model
TestSet<-DataSet2[(n+1):m,]
#And ⅓ to test it on unseen data
EMACrossModel<-svm( Cl(symbol) ~ ., data=TrainingSet)
summary(EMACrossModel)
pred<-predict(EMACrossModel,TestSet[,-3])

当我运行上面的代码时,我收到此错误

> EMACrossModel<-svm( Cl(symbol) ~ ., data=TrainingSet) 
Error in model.frame.default(formula = Cl(symbol) ~ ., data = TrainingSet, :
variable lengths differ (found for 'DayofWeek')

所以我的问题是(请原谅我,但我有不止一个问题)

1) How to solve my above problem?

2) Can in use both qualitative (eg: mon,tue,wed etc) and quantitative(eg 1.0,0.1,100 etc) data together in SVM regressions

3) How can i plot my above results with SVM decision
boundaries?

已编辑

数据集2

          DayofWeek   EMA AAPL.Close
2015-11-16 Mon -2.77 2.800003
2015-11-17 Tues -2.51 -1.229996
2015-11-18 Wed -1.67 1.529999
2015-11-19 Thurs -0.89 1.140000
2015-11-20 Fri -0.32 0.100006
2015-11-23 Mon -0.23 -1.519997
2015-11-24 Tues 0.00 1.549995
2015-11-25 Wed 0.00 -1.180000
2015-11-27 Fri -0.03 -0.480003
2015-11-30 Mon 0.02 0.310005
2015-12-01 Tues -0.09 -1.410004
2015-12-02 Wed -0.31 -1.059997
2015-12-03 Thurs -0.57 -1.350006
2015-12-04 Fri -0.10 3.739998
2015-12-07 Mon 0.05 -0.700004
2015-12-08 Tues 0.12 0.710006
2015-12-09 Wed -0.24 -2.019996
2015-12-10 Thurs -0.35 0.129997
2015-12-11 Fri -0.83 -2.010002
2015-12-14 Mon -1.15 0.300003
2015-12-15 Tues -1.56 -1.450004
2015-12-16 Wed -1.56 0.269996
2015-12-17 Thurs -1.82 -3.039994
2015-12-18 Fri -2.30 -2.880005
2015-12-21 Mon -2.23 0.050003
2015-12-22 Tues -2.07 -0.169999
2015-12-23 Wed -1.64 1.340004
2015-12-24 Thurs -1.40 -0.970001
2015-12-28 Mon -1.37 -0.769996
2015-12-29 Tues -0.98 1.779999
2015-12-30 Wed -0.92 -1.260002

修改后的以下代码运行但给出不同的答案

这些是修改

EMACrossModel<-ksvm(  Cl(symbol[1:n]) ~ ., data=TrainingSet,kernel="rbfdot",C=10) #kernlab libraries

pred<-predict(EMACrossModel,TestSet)

结果

> EMACrossModel
Support Vector Machine object of class "ksvm"

SV type: eps-svr (regression)
parameter : epsilon = 0.1 cost C = 10

Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 0.294836572886287

Number of Support Vectors : 17

Objective Function Value : -49.1082
Training error : 0.138329

> pred
[,1]
[1,] 119.7267
[2,] 119.9733
[3,] 120.7236
[4,] 121.8324
[5,] 121.5632
[6,] 121.4652
[7,] 119.6438
[8,] 119.6962
[9,] 119.0775
[10,] 116.4956

我除了预测结果是这样的

     [,1]
-1.327996
1.229939
-1.130000
0.100006
-1.519997
-0.480003
1.310005
-1.410004
-1.059997
1.350006
-2.739998
1.700004

我的猜测是,我当前的代码将股票价格而不是价格变化作为 Y 并使用它来建模 EMACrossModel。我对吗?如果是的话我该如何解决这个问题。

最佳答案

关于问题一您通过删除一些数据形成了训练集。但是,您没有限制符号集:

 EMACrossModel<-svm( Cl(symbol[1:n]) ~ ., data=TrainingSet)

我刚刚意识到您更可能想要的是:

 EMACrossModel<-svm( AAPL.Close ~ ., data=TrainingSet) 

一般来说,公式: Cl(符号[1:n]) ~ .定义了所学到的内容。目前它是“符号”。但是,我假设您想要预测 AAPL.Close 列。公式是 R ( https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html ) 中的一般概念。花一些时间来理解这些是值得的。编辑根据您的上述评论,这似乎得到了证实。结果如下

-0.1926745  
0.3578645
0.1830046
0.6362871
-0.3760084
-0.1443156
0.2615674
0.2589130
-0.4779677
-0.5928780

结束编辑

关于问题二,它取决于实现(和内核),但这里似乎是这样。

关于你的第三个问题。 E1071 包中包含一个示例:

data(cats, package = "MASS")
m <- svm(Sex~., data = cats)
plot(m, cats)

编辑我刚刚意识到这个绘图函数仅适用于分类器,但不适用于回归。但是,您可以轻松构建自己的绘图函数。为简单起见,我首先将星期几转换为数字。

  DataSet2$DayofWeek <- as.numeric(DataSet2$DayofWeek)

并重建分类器之后您可以通过以下方式可视化分类器

### plot the results of the support vector machine by
# first generating a grid covering the data range

#generate a sequence of 100 numbers between the minimum and maximum of DataSet2EMA
plot.ema.vec <- seq(min(DataSet2$EMA),max(DataSet2$EMA),(max(DataSet2$EMA)-min(DataSet2$EMA))/100)
#generate a "grid" of artificial data points 1:7 are the weekdays
# can be replaced by c("Mon",...,"Sun")
datagrid <- expand.grid(1:7,plot.ema.vec)
# set the names of the grid according to the dataset s.t. the classifier can use the data as input
names(datagrid) <- names(DataSet2[,1:2])
#calculate the predictions of the classifier
grid.pred <- predict(EMACrossModel,datagrid)
# normalise the prediction in [0,1] range to use it as colors
cols <- (grid.pred-min(grid.pred))/(max(grid.pred)-min(grid.pred))
# plot the decisions for the data
plot(datagrid$DayofWeek,datagrid$EMA , col=rgb(blue=cols,red=1-cols,green=0))

关于r - 如何在R中实现支持向量机,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34529119/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com