gpt4 book ai didi

广义(极端学生化偏差)ESD异常值测试的R代码

转载 作者:行者123 更新时间:2023-12-02 01:48:53 30 4
gpt4 key购买 nike

这是我感兴趣的测试: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h3.htm

如何将这段代码改编成接受数值向量并返回指定要删除哪些数据点的逻辑向量的函数?

我试图在下面这样做,但我被卡住了,因为当我对要返回的向量进行排序时,它与输入向量数据不一致。

# input data
y = c(-0.25, 0.68, 0.94, 1.15, 1.20, 1.26, 1.26,
1.34, 1.38, 1.43, 1.49, 1.49, 1.55, 1.56,
1.58, 1.65, 1.69, 1.70, 1.76, 1.77, 1.81,
1.91, 1.94, 1.96, 1.99, 2.06, 2.09, 2.10,
2.14, 2.15, 2.23, 2.24, 2.26, 2.35, 2.37,
2.40, 2.47, 2.54, 2.62, 2.64, 2.90, 2.92,
2.92, 2.93, 3.21, 3.26, 3.30, 3.59, 3.68,
4.30, 4.64, 5.34, 5.42, 6.01)

## Generate normal probability plot.
qqnorm(y)

removeoutliers = function(dfinputcol) {

y = as.vector(dfinputcol)

## Create function to compute the test statistic.
rval = function(y){
ares = abs(y - mean(y))/sd(y)
df = data.frame(y, ares)
r = max(df$ares)
list(r, df)}

## Define values and vectors.
n = length(y)
alpha = 0.05
lam = c(1:10)
R = c(1:10)

## Compute test statistic until r=10 values have been
## removed from the sample.
for (i in 1:10){

if(i==1){
rt = rval(y)
R[i] = unlist(rt[1])
df = data.frame(rt[2])
newdf = df[df$ares!=max(df$ares),]}

else if(i!=1){
rt = rval(newdf$y)
R[i] = unlist(rt[1])
df = data.frame(rt[2])
newdf = df[df$ares!=max(df$ares),]}

## Compute critical value.
p = 1 - alpha/(2*(n-i+1))
t = qt(p,(n-i-1))
lam[i] = t*(n-i) / sqrt((n-i-1+t**2)*(n-i+1))

}
## Print results.
newdf = data.frame(c(1:10),R,lam)
names(newdf)=c("Outliers","TestStat.", "CriticalVal.")

# determine how many outliers to remove
toremove = max(newdf$Outliers[newdf$TestStat. > newdf$CriticalVal.])

# create vector of same size as input vector
logicalvectorTifshouldremove = logical(length=length(y))

# but how to determine which outliers to remove?
# set largest data points as outliers to remove.. but could be the smallest in some data sets..
logicalvectorTifshouldremove = replace(logicalvectorTifshouldremove, tail(sort(y), toremove), TRUE)

return (logicalvectorTifshouldremove)
}

# this should have 3 data points set to TRUE .. but it has 2 and they aren't the correct ones
output = removeoutliers(y)
length(output[output==T])

最佳答案

我认为它写在你给的页面上(不完全是,但用两句话):

Remove the r observations that maximizes |x_i - mean(x)|

因此,您只需删除超出差异的 r 个即可获得没有异常值的数据,使用:

y[abs(y-mean(y)) < sort(abs(y-mean(y)),decreasing=TRUE)[toremove]]

您不需要最后两行代码。顺便说一句,你可以直接计算:

toremove = which(newdf$TestStat > newdf$CriticalVal)

为了简化一点,最终的功能可能是:

# Compute the critical value for ESD Test
esd.critical <- function(alpha, n, i) {
p = 1 - alpha/(2*(n-i+1))
t = qt(p,(n-i-1))
return(t*(n-i) / sqrt((n-i-1+t**2)*(n-i+1)))
}

removeoutliers = function(y) {

## Define values and vectors.
y2 = y
n = length(y)
alpha = 0.05
toremove = 0

## Compute test statistic until r=10 values have been
## removed from the sample.
for (i in 1:10){
if(sd(y2)==0) break
ares = abs(y2 - mean(y2))/sd(y2)
Ri = max(ares)
y2 = y2[ares!=Ri]

## Compute critical value.
if(Ri>esd.critical(alpha,n,i))
toremove = i
}

# Values to keep
if(toremove>0)
y = y[abs(y-mean(y)) < sort(abs(y-mean(y)),decreasing=TRUE)[toremove]]

return (y)
}

返回:

> removeoutliers(y)
[1] -0.25 0.68 0.94 1.15 1.20 1.26 1.26 1.34 1.38 1.43 1.49
[12] 1.49 1.55 1.56 1.58 1.65 1.69 1.70 1.76 1.77 1.81 1.91
[23] 1.94 1.96 1.99 2.06 2.09 2.10 2.14 2.15 2.23 2.24 2.26
[34] 2.35 2.37 2.40 2.47 2.54 2.62 2.64 2.90 2.92 2.92 2.93
[45] 3.21 3.26 3.30 3.59 3.68 4.30 4.64

关于广义(极端学生化偏差)ESD异常值测试的R代码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23986457/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com