gpt4 book ai didi

r - 将数据文件和标签文件组合在一起,在 R 中拥有一个单一的标签数据框

转载 作者:行者123 更新时间:2023-12-03 18:30:25 25 4
gpt4 key购买 nike

我有两个数据框,一个是调查数据(data.csv),另一个是标签数据(label.csv)。这是示例数据(我的原始数据大约有 150 个变量)

#sample data

df <- tibble::tribble(
~id, ~House_member, ~dob, ~age_quota, ~work, ~sex, ~pss,
1L, 4L, 1983L, 2L, 2L, 1, 1,
2L, 1L, 1940L, 7L, 2L, 1, 2,
3L, 2L, 1951L, 5L, 6L, 1, 1,
4L, 4L, 1965L, 2L, 2L, 1, 4,
5L, 3L, 1965L, 2L, 3L, 1, 1,
6L, 1L, 1951L, 3L, 1L, 1, 3,
7L, 1L, 1955L, 1L, 1L, 1, 3,
8L, 4L, 1982L, 2L, 2L, 2, 5,
9L, 2L, 1990L, 2L, 4L, 2, 3,
10L, 2L, 1953L, 3L, 2L, 2, 4
)


#sample label data
label <- tibble::tribble(
~variable, ~value, ~label,
"House_member", NA, "How many people live with you?",
"House_member", 1L, "1 person",
"House_member", 2L, "2 persons",
"House_member", 3L, "3 persons",
"House_member", 4L, "4 persons",
"House_member", 5L, "5 persons",
"House_member", 6L, "6 persons",
"House_member", 7L, "7 persons",
"House_member", 8L, "8 persons",
"House_member", 9L, "9 persons",
"House_member", 10L, "10 or more",
"dob", NA, "date of brith",
"age_quota", NA, "age_quota",
"age_quota", 1L, "10-14",
"age_quota", 2L, "15-19",
"age_quota", 3L, "20-29",
"age_quota", 4L, "30-39",
"age_quota", 5L, "40-49",
"age_quota", 6L, "50-70",
"age_quota", 7L, "70 +",
"work", NA, "what is your occupation?",
"work", 1L, "full time",
"work", 2L, "part time",
"work", 3L, "retired",
"work", 4L, "student",
"work", 5L, "housewife",
"work", 6L, "unemployed",
"work", 7L, "other",
"work", 8L, "kid under 15",
"sex", NA, "gender?",
"sex", 1L, "Man",
"sex", 2L, "Woman",
"pss", NA, "How often do you use PS?",
"pss", 1L, "Daily",
"pss", 2L, "several times per week",
"pss", 3L, "once per week",
"pss", 4L, "several time per month",
"pss", 5L, "Rarly"
)
我想知道有什么方法可以将这些文件组合在一起以获得一个标记的数据框,例如 SPSS的样式格式(dbl+lbl 格式)。我知道 labelled可以向未标记的向量添加值标签的包,如下例所示:
v <- labelled::labelled(c(1,2,2,2,3,9,1,3,2,NA), c(yes = 1, maybe = 2, no = 3))
我希望有一种比一个一个地为每个变量添加标签更好/更快的方法。

最佳答案

另一个 imap_dfc解决方案:

library(tidyverse)

df %>% imap_dfc(~{
label[label$variable==.y,c('label','value')] %>%
deframe() %>% # to named vector
haven::labelled(.x,.)
})

# A tibble: 10 x 7
id House_member dob age_quota work sex pss
<int+lbl> <int+lbl> <int+lbl> <int+lbl> <int+lbl> <dbl+lbl> <dbl+lbl>
1 1 4 [4 persons] 1983 2 [15-19] 2 [part time] 1 [Man] 1 [Daily]
2 2 1 [1 person] 1940 7 [70 +] 2 [part time] 1 [Man] 2 [several times per week]
3 3 2 [2 persons] 1951 5 [40-49] 6 [unemployed] 1 [Man] 1 [Daily]
4 4 4 [4 persons] 1965 2 [15-19] 2 [part time] 1 [Man] 4 [several time per month]
5 5 3 [3 persons] 1965 2 [15-19] 3 [retired] 1 [Man] 1 [Daily]
6 6 1 [1 person] 1951 3 [20-29] 1 [full time] 1 [Man] 3 [once per week]
7 7 1 [1 person] 1955 1 [10-14] 1 [full time] 1 [Man] 3 [once per week]
8 8 4 [4 persons] 1982 2 [15-19] 2 [part time] 2 [Woman] 5 [Rarly]
9 9 2 [2 persons] 1990 2 [15-19] 4 [student] 2 [Woman] 3 [once per week]
10 10 2 [2 persons] 1953 3 [20-29] 2 [part time] 2 [Woman] 4 [several time per month]
二手 tibble::deframehaven::labelled包含在 tidyverse
更换后速度对比 filter/ select通过直接访问 label :
Waldi <- function() {
df %>% imap_dfc(~{
label[label$variable==.y,c('label','value')] %>%
deframe() %>% # to named vector
haven::labelled(.x,.)})}

Waldi_old <- function() {
df %>% imap_dfc(~{
label %>% filter(variable==.y) %>%
select(label, value) %>%
deframe() %>% # to named vector
haven::labelled(.x,.)
})}

#EDIT : Included TIC33() for-loop solution

microbenchmark::microbenchmark(TIC3(),Waldi(),Anil(),TIC1(),Waldi_old(),Sinh())
Unit: microseconds
expr min lq mean median uq max neval cld
TIC3() 688.0 871.80 982.280 920.95 1005.55 1801.6 100 a
Waldi() 1345.5 1543.60 1804.758 1635.45 1893.75 4306.8 100 b
Anil() 4006.8 4476.65 5188.519 4862.95 5439.10 10163.6 100 c
TIC1() 3898.2 4278.80 5009.927 4774.95 5277.05 12916.2 100 c
Waldi_old() 18712.3 20091.75 21756.140 20609.35 22169.75 33359.8 100 d
Sinh() 22730.9 24093.45 25931.412 24946.00 26614.00 38735.3 100 e

关于r - 将数据文件和标签文件组合在一起,在 R 中拥有一个单一的标签数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67504200/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com