gpt4 book ai didi

r - 为什么 R 中 ksvm 中的概率和响应不一致?

转载 作者:行者123 更新时间:2023-12-02 03:14:57 26 4
gpt4 key购买 nike

我使用 R 中 kernlab 包中的 ksvm 来预测概率,并使用 predict.ksvm 中的 type="probabilities" 选项。但是,我发现有时使用 predict(model,observation,type="r") 不会产生由 predict(model,observation,type="p"给出的最高概率) )

示例:

> predict(model,observation,type="r")
[1] A
Levels: A B
> predict(model,observation,type="p")
A B
[1,] 0.21 0.79

这是正确的行为还是错误?如果这是正确的行为,我如何从概率中估计最可能的类别?

<小时/>

尝试可重现的示例:

library(kernlab)
set.seed(1000)
# Generate fake data
n <- 1000
x <- rnorm(n)
p <- 1 / (1 + exp(-10*x))
y <- factor(rbinom(n, 1, p))
dat <- data.frame(x, y)
tmp <- split(dat, dat$y)
# Create unequal sizes in the groups (helps illustrate the problem)
newdat <- rbind(tmp[[1]][1:100,], tmp[[2]][1:10,])
# Fit the model using radial kernal (default)
out <- ksvm(y ~ x, data = newdat, prob.model = T)
# Create some testing points near the boundary

testdat <- data.frame(x = seq(.09, .12, .01))
# Get predictions using both methods
responsepreds <- predict(out, newdata = testdat, type = "r")
probpreds <- predict(out, testdat, type = "p")

results <- data.frame(x = testdat,
response = responsepreds,
P.x.0 = probpreds[,1],
P.x.1 = probpreds[,2])

结果输出:

> results
x response P.x.0 P.x.1
1 0.09 0 0.7199018 0.2800982
2 0.10 0 0.6988079 0.3011921
3 0.11 1 0.6824685 0.3175315
4 0.12 1 0.6717304 0.3282696

最佳答案

如果您查看决策矩阵和投票,它们似乎更符合响应:

> predict(out, newdata = testdat, type = "response")
[1] 0 0 1 1
Levels: 0 1
> predict(out, newdata = testdat, type = "decision")
[,1]
[1,] -0.07077917
[2,] -0.01762016
[3,] 0.02210974
[4,] 0.04762563
> predict(out, newdata = testdat, type = "votes")
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 0 0 1 1
> predict(out, newdata = testdat, type = "prob")
0 1
[1,] 0.7198132 0.2801868
[2,] 0.6987129 0.3012871
[3,] 0.6823679 0.3176321
[4,] 0.6716249 0.3283751

kernlab 帮助页面 (?predict.ksvm) 链接到论文 Probability estimates for Multi-class Classification by Pairwise Coupling by T.F. Wu, C.J. Lin, and R.C. Weng.

第 7.3 节中提到决策和概率可能不同:

...We explain why the results by probability-based and decision-value-based methods can be so distinct. For some problems, the parameters selected by δDV are quite different from those by the other five rules. In waveform, at some parameters all probability-based methods gives much higher cross validation accuracy than δDV . We observe, for example, the decision values of validation sets are in [0.73, 0.97] and [0.93, 1.02] for data in two classes; hence, all data in the validation sets are classified as in one class and the error is high. On the contrary, the probability-based methods fit the decision values by a sigmoid function, which can better separate the two classes by cutting at a decision value around 0.95. This observation shed some light on the difference between probability-based and decision-value based methods...

我对这些方法不太熟悉,无法理解这个问题,但也许你知道,看起来有不同的方法可以用概率和其他方法进行预测,并且 type=response 对应于与用于预测概率的方法不同的方法。

关于r - 为什么 R 中 ksvm 中的概率和响应不一致?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15503027/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com