R比例置信区间因子-6ren

R比例置信区间因子

转载作者：行者123 更新时间：2023-12-04 19:08:48

我正在尝试从家庭调查中总结数据，因此我的大部分数据都是分类(因素)数据。我希望用对某些问题的回答频率图对其进行总结(例如，回答某些问题的家庭百分比条形图，误差条显示置信区间)。我发现了这个优秀的教程，我认为它是我祈祷的答案( http://www.cookbook-r.com/Manipulating_data/Summarizing_data/ )，但事实证明这只会对连续数据有所帮助。

我需要的是类似的东西，它可以让我计算计数的比例和这些比例的标准误差/置信区间。

基本上，我希望能够为我的调查数据中提出的每个问题生成如下所示的汇总表:

# X5employf X5employff  N(count) proportion SE of prop.  ci of prop
#   1          1        20    0.64516129    ?             ?       
#   1          2         1    0.03225806    ?             ?  
#   1          3         9    0.29032258    ?             ?
#   1          NA        1    0.290322581    ?            ?
#   2          4             1    0.1            ?             ?


structure(list(X5employf = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor"), X5employff = structure(c(1L, 2L, 3L, NA, 4L, 5L, 6L, 7L, 8L, 4L, 5L, 6L, 7L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor"), count = c(20L, 1L, 9L, 1L, 1L, 5L, 2L, 1L, 1L, 4L, 5L, 4L, 1L)), .Names = c("X5employf", "X5employff", "count"), row.names = c(NA, -13L), class = "data.frame")

然后我想使用这些带有显示置信区间的误差条的汇总数据在 ggplot(或类似的)中绘制条形图。

我曾想修改上面教程中提供的代码来计算上面的列，尽管作为 R 的相对新手，我有点挣扎!我一直在尝试使用 ggply 包，但在语法上不是很好，所以我设法使用以下代码做到了这一点:

> X5employ_props <- ddply(X5employ_counts, .(X5employf), transform, prop=count/sum(count))

但我最终得到了这个:

   X5employf X5employff count      prop
1          1          1    20 1.0000000
2          1          2     1 1.0000000
3          1          3     9 1.0000000
4          2          4     1 0.2000000
5          3          4     4 0.8000000
6          2          5     5 0.5000000
7          3          5     5 0.5000000
8          2          6     2 0.3333333
9          3          6     4 0.6666667
10         2          7     1 0.5000000
11         3          7     1 0.5000000
12         2          8     1 1.0000000
13         1       <NA>     1 1.0000000

我所有的比例都是 1，大概是因为它们是跨行而不是列计算的

我想知道是否有人可以帮助或知道可以为我完成这项工作的包/代码!

最佳答案

有多种计算二项式置信区间的方法，我怀疑对哪种方法最好达成共识。也就是说，这是使用几种不同方法计算二项式置信区间的一种方法。我不确定这是否有帮助。

library(binom)

x <- c(3, 4, 5, 6, 7)
n <- rep(10, length(x))

binom.confint(x, n, conf.level = 0.95, methods = "all")

          method x  n      mean      lower     upper
1  agresti-coull 3 10 0.3000000 0.10333842 0.6076747
2  agresti-coull 4 10 0.4000000 0.16711063 0.6883959
3  agresti-coull 5 10 0.5000000 0.23659309 0.7634069
4  agresti-coull 6 10 0.6000000 0.31160407 0.8328894
5  agresti-coull 7 10 0.7000000 0.39232530 0.8966616
6     asymptotic 3 10 0.3000000 0.01597423 0.5840258
7     asymptotic 4 10 0.4000000 0.09636369 0.7036363
8     asymptotic 5 10 0.5000000 0.19010248 0.8098975
9     asymptotic 6 10 0.6000000 0.29636369 0.9036363
10    asymptotic 7 10 0.7000000 0.41597423 0.9840258
11         bayes 3 10 0.3181818 0.09269460 0.6058183
12         bayes 4 10 0.4090909 0.15306710 0.6963205
13         bayes 5 10 0.5000000 0.22352867 0.7764713
14         bayes 6 10 0.5909091 0.30367949 0.8469329
15         bayes 7 10 0.6818182 0.39418168 0.9073054
16       cloglog 3 10 0.3000000 0.07113449 0.5778673
17       cloglog 4 10 0.4000000 0.12269317 0.6702046
18       cloglog 5 10 0.5000000 0.18360559 0.7531741
19       cloglog 6 10 0.6000000 0.25266890 0.8272210
20       cloglog 7 10 0.7000000 0.32871659 0.8919490
21         exact 3 10 0.3000000 0.06673951 0.6524529
22         exact 4 10 0.4000000 0.12155226 0.7376219
23         exact 5 10 0.5000000 0.18708603 0.8129140
24         exact 6 10 0.6000000 0.26237808 0.8784477
25         exact 7 10 0.7000000 0.34754715 0.9332605
26         logit 3 10 0.3000000 0.09976832 0.6236819
27         logit 4 10 0.4000000 0.15834201 0.7025951
28         logit 5 10 0.5000000 0.22450735 0.7754927
29         logit 6 10 0.6000000 0.29740491 0.8416580
30         logit 7 10 0.7000000 0.37631807 0.9002317
31        probit 3 10 0.3000000 0.08991347 0.6150429
32        probit 4 10 0.4000000 0.14933907 0.7028372
33        probit 5 10 0.5000000 0.21863901 0.7813610
34        probit 6 10 0.6000000 0.29716285 0.8506609
35        probit 7 10 0.7000000 0.38495714 0.9100865
36       profile 3 10 0.3000000 0.08470272 0.6065091
37       profile 4 10 0.4000000 0.14570633 0.6999845
38       profile 5 10 0.5000000 0.21765974 0.7823403
39       profile 6 10 0.6000000 0.30001552 0.8542937
40       profile 7 10 0.7000000 0.39349089 0.9152973
41           lrt 3 10 0.3000000 0.08458545 0.6065389
42           lrt 4 10 0.4000000 0.14564246 0.7000216
43           lrt 5 10 0.5000000 0.21762124 0.7823788
44           lrt 6 10 0.6000000 0.29997837 0.8543575
45           lrt 7 10 0.7000000 0.39346107 0.9154146
46     prop.test 3 10 0.3000000 0.08094782 0.6463293
47     prop.test 4 10 0.4000000 0.13693056 0.7263303
48     prop.test 5 10 0.5000000 0.20142297 0.7985770
49     prop.test 6 10 0.6000000 0.27366969 0.8630694
50     prop.test 7 10 0.7000000 0.35367072 0.9190522
51        wilson 3 10 0.3000000 0.10779127 0.6032219
52        wilson 4 10 0.4000000 0.16818033 0.6873262
53        wilson 5 10 0.5000000 0.23659309 0.7634069
54        wilson 6 10 0.6000000 0.31267377 0.8318197
55        wilson 7 10 0.7000000 0.39677815 0.8922087

我不完全确定你想要什么，但这里是创建一个表的代码，我认为它包含你所追求的所有参数。我使用 Agresti-Coull 方法从 Package binom 中挖掘出代码。

conf.level <- 0.95

x <-  c( 4, 5, 6)     # successes
n <-  c(10,10,10)     # trials

method <- 'ac'

# source code from package binom:

xn <- data.frame(x = x, n = n)
  all.methods <- any(method == "all")
  p <- x/n
  alpha <- 1 - conf.level
  alpha <- rep(alpha, length = length(p))
  alpha2 <- 0.5 * alpha
  z <- qnorm(1 - alpha2)
  z2 <- z * z
  res <- NULL
  if(any(method %in% c("agresti-coull", "ac")) || all.methods) {
    .x <- x + 0.5 * z2
    .n <- n + z2
    .p <- .x/.n
    lcl <- .p - z * sqrt(.p * (1 - .p)/.n)
    ucl <- .p + z * sqrt(.p * (1 - .p)/.n)
    res.ac <- data.frame(method = rep("agresti-coull", NROW(x)),
                         xn, mean = p, lower = lcl, upper = ucl)
    res <- res.ac    
  }

SE <- sqrt(.p * (1 - .p)/.n)
SE

另见: http://www.stat.sc.edu/~hendrixl/stat205/Lecture%20Notes/Confidence%20Interval%20for%20the%20Population%20Proportion.pdf

这是包含所有数据和参数的表格。

my.table <- data.frame(res, SE)
my.table

         method x  n mean     lower     upper        SE
1 agresti-coull 4 10  0.4 0.1671106 0.6883959 0.1329834
2 agresti-coull 5 10  0.5 0.2365931 0.7634069 0.1343937
3 agresti-coull 6 10  0.6 0.3116041 0.8328894 0.1329834

我还没有检查这些估计是否与 Agresti 书中的任何例子相匹配。但是，下面来自佛罗里达大学的第一个 R 函数返回与包 binom 相同的 CI 估计值。下面来自佛罗里达大学的第二个 R 函数没有。

http://www.stat.ufl.edu/~aa/cda/R/one-sample/R1/

x <- 4
n <- 10
conflev <- 0.95

addz2ci <- function(x,n,conflev){
   z = abs(qnorm((1-conflev)/2))
   tr = z^2     #the number of trials added
   suc = tr/2   #the number of successes added
   ptilde = (x+suc)/(n+tr)
   stderr = sqrt(ptilde * (1-ptilde)/(n+tr))
   ul = ptilde + z * stderr
   ll = ptilde - z * stderr
   if(ll < 0) ll = 0
   if(ul > 1) ul = 1
   c(ll,ul)
}
# Computes the Agresti-Coull CI for x successes out of n trials
# with confidence coefficient conflev. 

add4ci <- function(x,n,conflev){
   ptilde = (x+2)/(n+4)
   z = abs(qnorm((1-conflev)/2))
   stderr = sqrt(ptilde * (1-ptilde)/(n+4))
   ul = ptilde + z * stderr
   ll = ptilde - z * stderr
   if(ll < 0) ll = 0
   if(ul > 1) ul = 1
   c(ll,ul)
}
# Computes the Agresti-Coull `add 4' CI for x successes out of n trials
# with confidence coefficient conflev. Adds 2 successes and
# 4 trials.

另请注意，根据上面的第一个链接，当 n < 40 时不建议使用 Agresti-Coull 间隔。

至于你提到的其他包，我很少使用它们，但我很确定你可以在调用这些包的 R 脚本中包含上面的代码。

关于R比例置信区间因子，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17802320/

文章推荐： maven - 在 pom.xml 中使用 maven 构建 java 项目的基本标签

文章推荐： shell - PIP命令确定是否安装了最新版本？

文章推荐： viewmodel - MvvmCross ViewModel 缓存和重新初始化

Python matplotlib 区间
我在 Python 中使用 matplotlib，并制作了一个带条形的直方图。现在，当直方图出现时，仅 5 的倍数出现在 x 轴上，1000 的倍数出现在 y 轴上。对于 y 轴，这完全没有问题，但对
JavaScript - jQuery 区间
我正在使用 JavaScript 和 jQuery。我有以下脚本每 30 秒提醒一次 hi。 $(document).ready( function() { alert("hi"); setI
math - 一个好的不确定性(区间)算术库？
已结束。此问题正在寻求书籍、工具、软件库等的推荐。它不满足Stack Overflow guidelines 。目前不接受答案。我们不允许提出寻求书籍、工具、软件库等推荐的问题。您可以编辑问题，以便
swift - 如何在swift中获得固定长度和范围的 float 区间？
在 Numpy(python 包)中，可以使用语法 numpy.linspace(minValue, MaxValue, numberOfSamples) 构造 float 的离散区间。 . 我看到
c++ - 区间 C++ 中的数字
所以我想在 -3 到 3 的区间内制作一些数字，以便在下面绘制这些函数，所以我想要尽可能多的数字。我这样做: double k[601]; double y[601]; for (int i = 0
sql - 将列插入 postgres 区间
我有一个 Postgresql 表，用于存储有关计划进程的信息，包括上次执行进程的时间。不同的进程对其运行频率有不同的要求。我列出了需要重新运行的进程列表: SELECT * FROM proces
java - JDBI 区间 postgresql
如何正确使用此类带日期间隔的查询 @SqlUpdate("delete fromlogin where created < now() - ':days days' :: interval") v
algorithm - 区间(图论)算法讲解
我正在尝试计算图中的间隔，我在维基百科上找到了算法的数学描述: http://en.wikipedia.org/wiki/Interval_(graph_theory) H = { n0 }
c++ - 如何将货币值四舍五入到最接近的 $5.00 区间？
我有一个基于 Informix-SQL 的 Pawnshop 应用程序，该应用程序根据黄金的重量和纯度计算应向客户贷出多少钱。当铺的最低贷款额为 5.00 美元。当铺员工通常会借出以 5 或 0 结尾
postgresql - 基于 NHibernate 公式的属性 + PostgreSQL 区间
我将 NHibernate 与代码映射一起使用，并且我有一个由此公式创建的属性。 Property(x => x.IsInOverdue, mapper => mapper .Fo
python - 使用频率、区间、CDF、Python 的卡方检验
我正在尝试从头开始为 Beta 分布编写卡方拟合优度检验，而不使用任何外部函数。下面的代码报告“1”适合，即使来自 scipy.stats 的 kstest 返回零。数据是正常分布的，所以我的函数也应
c# - 内置 .Net 算法将值四舍五入到最接近的 10 区间
如何在 C# 中将任何值四舍五入到 10 区间？例如，如果我有 11，我希望它返回 10，如果我有 136，那么我希望它返回 140。我可以很容易地用手做 return ((int)(number
postgresql - 如何在 Go 中表示 PostgreSQL 区间
如何在 Go 中表示 PostgreSQL 区间？我的结构看起来像这样: type Product struct { Id int Name
Swift 3 通用类型函数将数值限制在 0 和 1 区间
我想编写一个函数，将数值限制在封闭的 0,1 区间内: func clamp01(_ value:T) -> T { return value 1 ? 1 : value } 在 Swift 3
postgresql - Postgres 不在区间查询中使用部分时间戳索引(例如，now() - 区间 '7 days')
我有一个简单的表格，用于存储来自在线仪表的降水读数。这是表定义: CREATE TABLE public.precip ( gauge_id smallint,
python-2.7 - 为什么链式(区间)比较不能在 numpy 数组上工作？
a = y def __gt__(self, y): return not self.x > y def __eq__(self, y): return
python - 查找 pandas 系列中至少 N 个样本的 bool 区间
我正在处理 pandas 数据框 D=pd.DataFrame(data=[1.0,2.0,2.0,2.0,5.0,3.0,2.0,2.0,5.0,5.0,8.0,1.0]) 我识别低于特定阈值的值
c++ - 给定一个整数N> 0，区间[0，2 ^ N)中有多少个整数恰好有N-1个设置位？编写返回正确答案的简短函数
我编写了一些C++代码来解决此问题: #include #include using namespace std; unsigned int countSetBits(unsigned int n
python - Gauss-Legendre 区间 -x -> 无穷大 : adaptive algorithm to transform weights and nodes efficiently
好的，我知道之前有人用一个有限的缩放示例问过这个问题 [-1, 1]间隔 [a, b] Different intervals for Gauss-Legendre quadrature in num

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

R比例置信区间因子