R 函数适用于某些数据框，但不适用于其他数据框？-6ren

R 函数适用于某些数据框，但不适用于其他数据框？

转载作者：行者123 更新时间：2023-12-04 05:43:47

25

4

我有一个数据框，它总结了传递给它的数据框中缺失和非缺失观测值的数量[1]。然后我被要求在我拥有的数据中测试两个治疗组之间的差异(我个人不同意这样做的必要性或效用，但这是我被要求做的)。所以我写了一个小函数来做到这一点......

quick.test <- function(x, y){
  chisq   <- chisq.test(x = x,  y = y)
  fisher  <- fisher.test(x = x, y = y)
  results <- cbind(chisq  = chisq$statistic,
                   df     = chisq$parameter,
                   p      = chisq$p.value,
                   fisher = fisher$p.value)
  results
}

然后我使用 apply() 将相关列传递给这个函数，如下所示......

apply(miss.t1, 1, function(x) quick.test(x[2:3], x[4:5]))

这对于上面指定的 miss.t1 数据框很好，但我正在处理时间序列数据并且有三个我想总结的时间点，所以有miss.t2和miss.t3(每个都是总结数字每个时间点的存在/缺失数据，并使用 [1] 中描述的函数以相同方式创建)。

Miss.t2 失败并出现以下错误...

apply(miss.t2, 1, function(x) quick.test(x[2:3], x[4:5]))
Error in chisq.test(x = x, y = y) : 
  'x' and 'y' must have at least 2 levels

我最初的想法是由于某种原因其中一列缺少值，但情况似乎并非如此......

> describe(miss.t2)
miss.t2 

 5  Variables      171  Observations
--------------------------------------------------------------------------------
variable 
      n missing  unique 
    171       0     171 

lowest : Abtotal   Abyn      agg_ment  agg_phys  All.score
highest: z_pf      z_re      z_rp      z_sf      z_vt      
--------------------------------------------------------------------------------
nmiss.1 
      n missing  unique    Mean 
    171       0       4   8.649 

0 (6, 4%), 8 (9, 5%), 9 (153, 89%), 10 (3, 2%) 
--------------------------------------------------------------------------------
npresent.1 
      n missing  unique    Mean 
    171       0       4   9.351 

8 (3, 2%), 9 (153, 89%), 10 (9, 5%), 18 (6, 4%) 
--------------------------------------------------------------------------------
nmiss.2 
      n missing  unique    Mean 
    171       0       4   10.65 

0 (6, 4%), 11 (160, 94%), 12 (4, 2%), 13 (1, 1%) 
--------------------------------------------------------------------------------
npresent.2 
      n missing  unique    Mean 
    171       0       4   14.35 

12 (1, 1%), 13 (4, 2%), 14 (160, 94%), 25 (6, 4%) 
--------------------------------------------------------------------------------

我尝试的下一件事是通过使用 head(miss.t2, n=XX) 来尝试 Miss.t2 的子集，它可以正常工作到第 54 行......

> apply(head(miss.t2, n=53), 1, function(x) quick.test(x[2:3], x[4:5]))
     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[1,] 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
[2,] 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[3,] 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[4,] 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
     29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
[1,]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
[2,]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[3,]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[4,]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
There were 50 or more warnings (use warnings() to see the first 50)
> apply(head(miss.t2, n=54), 1, function(x) quick.test(x[2:3], x[4:5]))
Error in chisq.test(x = x, y = y) : 
  'x' and 'y' must have at least 2 levels
> miss.t2[54,]
   variable nmiss.1 npresent.1 nmiss.2 npresent.2
54      psq      10          8      11         14
> traceback()
5: stop("'x' and 'y' must have at least 2 levels") at #2
4: chisq.test(x = x, y = y) at #2
3: quick.test(x[2:3], x[4:5])
2: FUN(newX[, i], ...)
1: apply(head(miss.t2, n = 54), 1, function(x) quick.test(x[2:3], 
       x[4:5]))

与数据框的“底部”类似，最后 26 行被解析得很好，但不是最后的第 27 行......

> apply(tail(miss.t2, n=26), 1, function(x) quick.test(x[2:3], x[4:5]))
     146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163
[1,]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[2,]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
[3,]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
[4,]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
     164 165 166 167 168 169 170 171
[1,]   0   0   0   0   0   0   0   0
[2,]   1   1   1   1   1   1   1   1
[3,]   1   1   1   1   1   1   1   1
[4,]   1   1   1   1   1   1   1   1
There were 26 warnings (use warnings() to see them)
> apply(tail(miss.t2, n=27), 1, function(x) quick.test(x[2:3], x[4:5]))
Error in chisq.test(x = x, y = y) : 
  'x' and 'y' must have at least 2 levels
In addition: Warning message:
In chisq.test(x = x, y = y) : Chi-squared approximation may be incorrect

> miss.t2[118,]
    variable nmiss.1 npresent.1 nmiss.2 npresent.2
118     sf16       9          9      11         14

我看不出这两行有什么问题，这意味着它们应该失败，上面显示的 traceback() 没有透露任何有用的东西(在我看来)。

任何人都可以就为什么或哪里出错提供任何建议吗？

提前谢谢了，

尼尔

编辑:对 Vincent Zoonekynd 的格式化回复......

我选择了 ?chisq.test() 中描述的 chisq.test(x = x, y = y) 版本，按照您的建议使用 cbind() 生成矩阵结果
sum(x) 中的错误:参数的“类型”(字符)无效。

放置打印语句并显示 x 和 y 的长度会导致相同的错误，但将值和长度显示为...

> miss.t2.res <- data.frame(t(apply(miss.t2, 1, function(x) quick.test(x[2:3], x[4:5])))) 
[1] "Your x is : 9" "Your x is : 9" 
[1] 2    ### < Length of x
[1] "Your y is : 11" "Your y is : 14"
[1] 2    ### < Length of y
Error in chisq.test(x = x, y = y) : 'x' and 'y' must have at least 2 levels

编辑 2:感谢 Vincent Zoonekynd 指针，问题是因为两个单元格的计数相同，所以对 chisq.test() 的调用将这些视为因素并将它们折叠起来。解决方案是修改 quick.test() 函数并强制传递到矩阵中的参数，所以现在起作用的函数是......

quick.test <- function(x, y){
  chisq   <- chisq.test(rbind(as.numeric(x), as.numeric(y)))
  fisher  <- fisher.test(rbind(as.numeric(x), as.numeric(y)))
  results <- cbind(chisq  = chisq$statistic,
                   df     = chisq$parameter,
                   p      = chisq$p.value,
                   fisher = fisher$p.value)
  results
}

非常感谢文森特的帮助和指点，非常感谢。

[1] http://gettinggeneticsdone.blogspot.co.uk/2011/02/summarize-missing-data-for-all.html

最佳答案

Vincent Zoonkeynd 在上面的评论中建议的解决方案是修改 quick.test() 函数并强制传递到矩阵中的参数，所以现在起作用的函数是......

quick.test <- function(x, y){
  chisq   <- chisq.test(rbind(as.numeric(x), as.numeric(y)))
  fisher  <- fisher.test(rbind(as.numeric(x), as.numeric(y)))
  results <- cbind(chisq  = chisq$statistic,
                   df     = chisq$parameter,
                   p      = chisq$p.value,
                   fisher = fisher$p.value)
  results
}

关于R 函数适用于某些数据框，但不适用于其他数据框？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10945718/

25

4

0

文章推荐： wpf - Application.Resources 中定义的样式不适用于控件

文章推荐： ejb - @EJB 未注入(inject)到 JAX-RS

文章推荐： visual-studio-2010 - VS2010 负载测试运行准备和验证一次

文章推荐： reporting-services - SSRS 通过单击按钮显示/隐藏参数 Pane

android - 适用:找不到release.keystore
我在我的 Xcode 项目目录中输入了以下内容: keytool -genkey -v -keystore release.keystore -alias mykey -keyalg RSA \
python - Pandas 适用，但访问之前计算的值
假设我有一个像这样的 DataFrame(或 Series): Value 0 0.5 1 0.8 2 -0.2 3 None 4 None 5 None
python - 冗长的 Pandas 适用
我正在对一个 Pandas 系列进行相对繁重的应用。有什么方法可以返回一些打印反馈，说明每次调用函数时在函数内部进行打印还有多远？最佳答案您可以使用跟踪器包装您的函数。以下两个示例，一个基于完成的
python - 当单元格包含列表时， Pandas 适用
我有一个 DataFrame，其中一列包含列表作为单元格内容，如下所示: import pandas as pd df = pd.DataFrame({ 'col_lists': [[1, 2
python - Pandas 适用，但仅适用于满足条件的行
我想使用 Pandas df.apply 但仅限于某些行作为一个例子，我想做这样的事情，但我的实际问题有点复杂: import pandas as pd import math z = pd.Dat
python - 如果条件适用于后续行和分组依据，则 Pandas 适用
我有以下 Pandas 数据框 id dist ds 0 0 0 0 5 1 0 0 7 2 0 0
java - gradle没有java方法的签名(hashmap.getOrDefault)适用
这发生在我尝试使用 Gradle 构建时。由于字符串是对象，因此似乎没有理由发生此错误: No signature of method: java.util.HashMap.getOrDefault(
javascript - Backbone 示例应用程序和 javascript 适用
您好，有人可以解释为什么在 remaining() 函数中的 Backbone 示例应用程序 ( http://backbonejs.org/examples/todos/index.html ) 中
grails - Grails BootStrap:无方法签名:* .addTo *适用
我有两个域类:用户 class User { String username String password String email Date dateCreated
python - 类型错误 : Positional Arguments with pandas. 适用
问题陈述: 一个 pandas dataframe 列系列，same_group 需要根据两个现有列 row 和 col 的值从 bool 值创建。如果两个值在字典 memberships 中具有相似
android - 适用:iOS 7.1 支持 MKMapItem
apporable 报告以下错误: error: unknown type name 'MKMapItem'; did you mean 'MKMapView'? MKMapItem* destina
python - 使用 API 调用时， throttle Pandas 适用
我有一个带有地址列的大型 DataFrame: data addr 0 0.617964 IN,Krishnagiri,635115 1 0.635428 IN,Chennai
c# - 无 Max() 适用 : Sequence contains no elements
我有一个列表list，里面有这样的项目 ElementA: Number=1, Version=1 ElementB: Number=1, Version=2 ElementC: Number=1,
ios - 适用: 'OBJC_CLASS_$_MKMapView' 和 'OBJC_CLASS_$_MKPinAnnotationView' 的编译失败
我正在编译我的源代码，它只是在没有运行应用程序的情况下终止。这是我得到的日志: Build/android-armeabi-debug/com.app4u.portaldorugby/PortalDo
python - 'numpy.float6 4' object has no attribute ' 适用'
我正在尝试根据另一个单元格的值更改单元格值(颜色“红色”或“绿色”)。我运行以下命令: df.loc[0, 'Colour'] = df.loc[0, 'Count'].apply(lambda x:
scalaz List[StateT].sequence - 找不到参数 n 的隐式值 : scalaz. 适用
我想弄清楚如何使用 StateT结合两个 State基于对我的 Scalaz state monad examples 的评论的状态转换器回答。看来我已经很接近了，但是在尝试申请 sequence
c# - 如果源绑定(bind)适用，如何访问快速访问工具栏命令 `Add to Quick Access Tool`
如果我已经为它绑定(bind)了集合，我该如何添加 RibbonLibrary 默认的快速访问项容器。当我从 UI 添加快速访问工具项时，它会抛出 Operation is not valid whi
haskell - 适用:证明 `pure f <*> x = pure (flip ($)) <*> x <*> pure f`
在我学习期间Typoclassopedia我遇到了这个证明，但我不确定我的证明是否正确。问题是: One might imagine a variant of the interchange law

首页

博学

6Ren·AI

商城

R 函数适用于某些数据框，但不适用于其他数据框？