gpt4 book ai didi

r - 将描述性统计行值从 R 导出到 Excel 工作表中

转载 作者:行者123 更新时间:2023-12-04 20:12:22 26 4
gpt4 key购买 nike

我有一个大型数据库,其中包含 100 多家不同公司的 85,000 多个值,并标记了 100 多个变量。我的目标是确定与几个变量相对应的描述性统计数据(平均值、标准偏差、最小最大值和值的数量)。

以下是关于一个给定公司的一组信息,我将其称为 F 公司。

Attendance   Number of representatives   Number of Presenters     Company Audience  
29 2 30 2
20 3 30 4
30 10 20 5
40 20 10 5
10 30 13 5

我要做的是让 R 计算描述性统计数据 [平均值、标准差、最小值和最大值],并为这些特定列中的每一列计算并以下列方式将其导出到 Excel 中:
Company F  Average Number of Attendance Standard Deviation of Number of Attendance Min  Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives   Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience 

因为这是一个很长的行,所以我将通过说我试图找到这些列中的每一列的描述性统计[平均值、标准偏差、最小值、最大值和 n]来总结它。这些都应该对应于F公司。

我是如何尝试解决这个问题的:

我已经使用 R 中的描述性统计功能来获取数据框来为我识别代码。为此,我使用了 psych 包:
 library(psych)
describe(CompanyF$Attendance)
describe(CompanyF$NumberofRepresentatives)
describe(CompanyF$Number_of_Presenters
describe(CompanyF$Company Audience)

通过使用该包,我能够获取数据框,然后进入 Excel 并手动构建行,输入我收到的值并省略 psych 库包提供的与我感兴趣的内容不对应的任何信息。以下是我从 psych 包中获得的信息类型的示例:
vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
1 1 559 2.02 2.21 1 1.75 1.48 0 9 9 0.78 -0.65 0.09

这个过程非常耗时并且容易出错。完成 F 公司的工作后,我在 Excel 中为 F 公司创建一个新行,但这次是为另一家公司(例如 G 公司)创建一个新行,然后我继续查找描述性统计数据 [平均值、标准差、最小值, max 和 n] 用于每个感兴趣的变量(出席人数、代表人数、演示者人数和公司观众)。

我已经寻找了各种解决方案,其中一个来自这个堆栈溢出帖子 Export data from R into Excel但我无法找到有关如何将 R 逐行信息导入 Excel 以及如何指定它识别我上面列出的描述性统计数据的解释。

理想情况下,我会将以下输出放入 Excel:
Company F  Average Number of Attendance Standard Deviation of Number of Attendance Min  Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives   Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience 
Company G Average Number of Attendance Standard Deviation of Number of Attendance Min Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience
Company H Average Number of Attendance Standard Deviation of Number of Attendance Min Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience

等等。

我的数据的原始子集如下:
structure(list(sn = structure(c(2L, 2L, 3L, 5L, 2L, 7L, 1L, 9L, 
1L, 9L, NA, 9L, 1L, 26L, 11L, 9L, 7L, NA, NA, 7L, 17L, 9L, NA,
21L, 7L, 17L, 7L, 7L, 16L, 7L, 7L, 7L, 7L, 26L, 7L, 6L, 26L,
22L, NA, NA, 11L, 23L, 23L, 26L, NA, 7L, 23L, 1L, NA, 1L, 7L,
11L, 12L, 13L, 9L, NA, 15L, NA, 20L, 15L, NA, 17L, 5L, NA, 22L,
15L, NA, NA, 5L, 8L, 32L, 29L, 23L, 33L, 1L, 23L, 14L, 6L, 7L,
15L), .Label = c("Broome Street", "Company A", "Company B", "Company BC",
"Company C", "Company CC", "Company D Clinton", "Company DD",
"Company E", "Company ED BroadCompany", "Company G", "Company H
BroadCompany",
"Company I BroadCompany", "Company I Studio", "Company J", "Company K",
"Company L", "Company M", "Company M BroadCompany", "Company M HS
BroadCompany",
"Company MCC BroadCompany", "Company N", "Company P", "Company Q",
"Company Q Company N", "Company Q Company ZZ", "Company R - Company ZZ",
"Company SLab", "Company Z", "Company ZE", "Company ZED", "Company ZEQ",
"Company ZZ", "Company ZZQ", "Company ZZQ Company N"), class = "factor"),
earn_tot = c(21.85, 20.8, NA, 8.16, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 7.16, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 43.32, NA, 30.48, NA, NA, 34.9, NA, NA, NA, NA, NA, 25.82,
40.75, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA,
30, NA, NA, NA, NA, NA, NA, 39.1, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 52.29, 44.32, NA, 7, 38.32, 0, NA, NA, 8.25,
NA, NA), earn_and_current_tot = c(29.43, 20.8, NA, 8.16,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 7.16, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 49.9, NA, 37.56, NA, NA, 41.98,
NA, NA, NA, NA, NA, 37.32, 49, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 0, NA, NA, NA, 37, NA, NA, NA, NA, NA, NA, 47.68,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 57.29, 48.48, NA,
7, 45.9, 0, NA, NA, 15.75, NA, NA), pass_99 = c(0L, 0L, NA,
NA, NA, NA, 1L, NA, NA, NA, NA, 5L, NA, 0L, NA, 5L, NA, NA,
NA, 0L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 0L, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 4L, 0L,
NA, NA, NA, 4L, 4L, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA,
NA, 1L, NA, NA, NA, NA, 1L, NA, NA, 0L, 4L, 0L, NA, NA, 0L,
NA, NA), pass_65 = c(0L, 0L, 5L, 0L, 6L, NA, 0L, 5L, NA,
5L, NA, 6L, NA, 0L, 5L, 2L, NA, NA, NA, 0L, 5L, 5L, NA, NA,
NA, 0L, NA, 1L, 4L, 7L, 5L, 5L, 7L, 0L, 5L, NA, 0L, 1L, NA,
NA, NA, 2L, 0L, 6L, NA, 8L, 2L, 0L, NA, 4L, 0L, 1L, 3L, NA,
NA, NA, NA, NA, 4L, 0L, NA, 5L, 7L, NA, 0L, NA, NA, NA, 5L,
0L, 5L, 4L, 0L, 2L, 0L, 0L, 7L, 0L, NA, 5L)), .Names = c("sn",
"earn_tot", "earn_and_current_tot", "pass_99", "pass_65"), row.names = c(NA,
80L), class = "data.frame")

有四个最重要的子集列。这些列是“earn_tot”、“earn_and_current_tot”、“pass_99”和“pass_65”。这里列出的许多公司都是匿名的。我正在与大约 100 家公司合作。在标题为“sn”的列下有许多公司名称。整个子集数据集的名称称为 Subset.MergedEx.So。

我很抱歉没有提出一个很好的可重复的例子。感谢您的耐心等待。我一直在阅读如何构建一个并使用了以下代码:
dput((head(Subset.MergedEx.SO, 80)))

最佳答案

这可能不是最佳解决方案,但它只使用了 basepsych包裹。

这是数据

df <- data.frame(company = rep(c("A","B", "C","D"), each = 5),
attendance = sample(5:10,20,TRUE),
representatives = sample(2:30,20,TRUE),
presenters = sample(20:30,20,TRUE),
audience = sample(50:70,20,TRUE))

我写了一个函数来获取你需要的值。
我假设您只有 5 类信息:公司名称、出席情况、代表、演示者、观众。
    get.values<-function(x){
require(psych)
info<-describeBy(x[,2:5], group = x[,1])
n.companies<-length(levels(df[,1]))
n<-list()
mean<-list()
sd<-list()
min<-list()
max<-list()
for(i in 1:n.companies){
n[[i]]<-info[[i]][,2]
mean[[i]]<-info[[i]][,3]
sd[[i]]<-info[[i]][,4]
min[[i]]<-info[[i]][,8]
max[[i]]<-info[[i]][,9]
}
l<-Map(c, mean, sd, min, max, n)
valuedf<-do.call(rbind, l)
return(valuedf)
}

我还编写了一个函数来生成您想要的列名,您可以将它们命名为您想要的任何名称:
get.names<-function(x){
require(psych)
names<-rownames(describe(x[,2:5]))
avg<-character()
sd<-character()
min<-character()
max<-character()
total<-character()
for(i in 1:length(names)){
avg[i]<-paste("average number of", names[i])
sd[i]<-paste("standard deviation of", names[i])
min[i]<-paste("min number of", names[i])
max[i]<-paste("max number of", names[i])
total[i]<-paste("total number of", names[i])
}
cnames<-c(avg,sd,min,max,total)
return(cnames)
}

将值和名称组合成一个新的数据框:
output<-get.values(df)
col.names<-get.names(df)
colnames(output)<-col.names
rownames(output)<-levels(df[,1])

导出到excel:
library(xlsx)
write.xlsx(output, "descriptives.xlsx")

关于r - 将描述性统计行值从 R 导出到 Excel 工作表中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35435770/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com