gpt4 book ai didi

r - 如何将数据框中的向量值与 R 中的常量进行比较?

转载 作者:行者123 更新时间:2023-12-04 05:13:13 24 4
gpt4 key购买 nike

背景:我正在使用一些人口普查公众使用的微数据样本(特别是美国社区调查)来检查已完成不同学位(例如高中文凭、学士学位、硕士学位)的人的行为。该公共(public)使用文件的变量称为“Schooling”。问题是变量“Schooling”中包含的代码每年都在变化。例如,对于截至 2007 年的文件,“13”值反射(reflect)完成学士学位,但从 2008 年开始,当某人完成学士学位时,该值变为“21”。
目标:创建一个新的“Degree Competed”变量,翻译“Schooling”代码以反射(reflect)完成的学位级别,同时考虑到文件的年份。
后勤:所有年份的文件都已连接起来,出于审查目的,我必须按原样处理文件,而不是在它到达这一点之前对其进行更正。
现有代码:这是我尝试过的。

if      (original.file$year %in% c(2000,2001)) {
if (original.file$Schooling <= 08) {original.file$degree.completed <- 0}
else if (original.file$Schooling <= 10) {original.file$degree.completed <- 1}
else if (original.file$Schooling <= 12) {original.file$degree.completed <- 2}
else if (original.file$Schooling == 13) {original.file$degree.completed <- 3}
else if (original.file$Schooling == 14) {original.file$degree.completed <- 4}
else if (original.file$Schooling == 15) {original.file$degree.completed <- 5}
else if (original.file$Schooling == 16) {original.file$degree.completed <- 6}
}
else if (original.file$year %in% c(2002,2003,2004,2005,2006,2007)) {
if (original.file$Schooling <= 08) {original.file$degree.completed <- 0}
else if (original.file$Schooling <= 11) {original.file$degree.completed <- 1}
else if (original.file$Schooling == 12) {original.file$degree.completed <- 2}
else if (original.file$Schooling == 13) {original.file$degree.completed <- 3}
else if (original.file$Schooling == 14) {original.file$degree.completed <- 4}
else if (original.file$Schooling == 15) {original.file$degree.completed <- 5}
else if (original.file$Schooling == 16) {original.file$degree.completed <- 6}
}
else if (original.file$year %in% c(2008,2009,2010,2011)) {
if (original.file$Schooling <= 15) {original.file$degree.completed <- 0}
else if (original.file$Schooling <= 19) {original.file$degree.completed <- 1}
else if (original.file$Schooling == 20) {original.file$degree.completed <- 2}
else if (original.file$Schooling == 21) {original.file$degree.completed <- 3}
else if (original.file$Schooling == 22) {original.file$degree.completed <- 4}
else if (original.file$Schooling == 23) {original.file$degree.completed <- 5}
else if (original.file$Schooling == 24) {original.file$degree.completed <- 6}
}
问题:我收到以下此类警告消息。

Warning messages:

1: In if (original.file$year %in% c(2000, 2001)) { : the condition has length > 1 and only the first element will be used

2: In if (original.file$Schooling <= 8) { : the condition has length > 1 and only the first element will be used

3: In if (original.file$Schooling <= 10) { : the condition has length > 1 and only the first element will be used


问题:我知道这里的“if”存在向量与标量问题,正如我从 StackOverflow 上的其他问题中看到的那样,但答案似乎不适用于这种情况。这里的解决方案是什么?

最佳答案

首先,使用 cuttable而不是所有这些if的和 else的:

CutOffs1 <- c(0,8,10,12,13,14,15,16)
CutOffs2 <- c(0,8,11,12,13,14,15,16)
CutOffs3 <- c(0,15,19,20,21,22,23,24)
CutOffs <- cbind(CutOffs1, CutOffs2, CutOffs3)
MyTable <- apply(CutOffs, 2, function(X) cut(1:24, X, FALSE)-1)

CutOffs1 CutOffs2 CutOffs3
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[4,] 0 0 0
[5,] 0 0 0
[6,] 0 0 0
[7,] 0 0 0
[8,] 0 0 0
[9,] 1 1 0
[10,] 1 1 0
[11,] 2 1 0
[12,] 2 2 0
[13,] 3 3 0
[14,] 4 4 0
[15,] 5 5 0
[16,] 6 6 1
[17,] NA NA 1
[18,] NA NA 1
[19,] NA NA 1
[20,] NA NA 2
[21,] NA NA 3
[22,] NA NA 4
[23,] NA NA 5
[24,] NA NA 6

您还需要 cut年成因素。
original.file$Period <- cut(original.file$year, c(2000,2001, 2007, 2011), FALSE,   
include.lowest=TRUE)
## To demonstrate:
> cbind(2000:2011, cut(2000:2011, c(2000,2001, 2007, 2011), FALSE,
+ include.lowest=TRUE))
[,1] [,2]
[1,] 2000 1
[2,] 2001 1
[3,] 2002 2
[4,] 2003 2
[5,] 2004 2
[6,] 2005 2
[7,] 2006 2
[8,] 2007 2
[9,] 2008 3
[10,] 2009 3
[11,] 2010 3
[12,] 2011 3

然后你应该能够做到:
Degrees <- apply(original.file, 1, function(X) MyTable[X['Schooling'], X['Period']])

关于r - 如何将数据框中的向量值与 R 中的常量进行比较?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14633863/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com