gpt4 book ai didi

r - 如何使用 R 调查包分析加权样本中的多响应问题?

转载 作者:行者123 更新时间:2023-12-01 01:58:24 26 4
gpt4 key购买 nike

我对 R 比较陌生。我想知道如何使用“调查”包( http://r-survey.r-forge.r-project.org/survey/ )来分析加权样本的多响应问题?棘手的一点是可以勾选多个响应,因此响应存储在多列中。

例子:

我有来自 10 个地区随机抽取的 500 名受访者的调查数据。假设被问到的主要问题是(存储在 H1_AreYouHappy 列中):“你快乐吗?” - 是/否/不知道

受访者被问到一个后续问题:“你为什么(不)快乐?”
这是一道多项选择题,可以勾选多个响应框,因此响应存储在单独的列中,例如:

H1Yes_Why1(0/1,即勾选或不勾选框) - '因为经济';

H1Yes_Why2 (0/1) - '因为我很健康';

H1Yes_Why3 (0/1) - “因为我的社交生活”。

这是我的假数据集

districts <- c('Green', 'Red','Orange','Blue','Purple','Grey','Black','Yellow','White','Lavender')
myDataFrame <- data.frame(H1_AreYouHappy=sample(c('Yes','No','Dont Know'),500,rep=TRUE),
H1Yes_Why1 = sample(0:1,500,rep=TRUE),
H1Yes_Why2 = sample(0:1,500,rep=TRUE),
H1Yes_Why3 = sample(0:1,500,rep=TRUE),
District = sample(districts,500,rep=TRUE), stringsAsFactors=TRUE)

我正在使用 R 'survey' 包根据每个地区的实际人口规模应用分层权重
library(survey)
# Create an unweighted survey object
mySurvey.unweighted <- svydesign(ids=~1, data=myDataFrame)

# Choose which variable contains the sample distribution to be weighted by
sample.distribution <- list(~District)

# Specify (from Census data) how often each level occurs in the population
population.distribution <- data.frame(District = c('Green', 'Red','Orange','Blue','Purple','Grey','Black','Yellow','White','Lavender'),
freq = c(0.1824885, 0.0891206, 0.1381343, 0.1006533, 0.1541269, 0.0955853, 0.0268172, 0.0398353, 0.0809459, 0.0922927))

# Apply the weights
mySurvey.rake <- rake(design = mySurvey.unweighted, sample.margins=sample.distribution, population.margins=list(population.distribution))

# Calculate the weighted mean for the main question
svymean(~H1_AreYouHappy, mySurvey.rake)

# How can I calculate the WEIGHTED means for the multiple choice - multiple response follow-up question?

如何计算多项选择问题的加权平均值(即跨 0/1 响应列)?

如果我想要它不加权,我可以使用这个函数来计算与我的前缀“H1Yes_Why”匹配的所有列的频率
multipleResponseFrequencies = function(data, question.prefix) {
# Find the columns with the questions
a = grep(question.prefix, names(data))
# Find the total number of responses
b = sum(data[, a] != 0)
# Find the totals for each question
d = colSums(data[, a] != 0)
# Find the number of respondents
e = sum(rowSums(data[,a]) !=0)
# d + b as a vector. This is the overfall frequency
f = as.numeric(c(d, b))
result <- data.frame(question = c(names(d), "Total"),
freq = f,
percent = (f/b)*100,
percentofcases = (f/e)*100)
result
}
multipleResponseFrequencies(myDataFrame, 'H1Yes_Why')

任何帮助将不胜感激。

最佳答案

我想你想要

svyratio( ~ H1Yes_Why1 + H1Yes_Why2 + H1Yes_Why3 , ~ as.numeric( H1Yes_Why1 + H1Yes_Why2 + H1Yes_Why3 ) , mySurvey.rake)

关于r - 如何使用 R 调查包分析加权样本中的多响应问题?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38675151/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com