作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我是使用 R 进行数据分析的新手。我最近获得了一个预先格式化的环境观测模型数据集,其示例子集如下所示:
date site obs mod site obs mod
2000-09-01 00:00:00 campus NA 61.63 city centre 66 56.69
2000-09-01 01:00:00 campus 52 62.55 city centre NA 54.75
2000-09-01 02:00:00 campus 52 63.52 city centre 56 54.65
date site obs mod
2000-09-01 00:00:00 campus NA 61.63
2000-09-01 01:00:00 campus 52 62.55
2000-09-01 02:00:00 campus 52 63.52
2000-09-01 00:00:00 city centre 66 56.69
2000-09-01 01:00:00 city centre NA 54.75
2000-09-01 02:00:00 city centre 56 54.65
test.melt <- melt(test.data, id.vars = "date", measure.vars = c("site", "obs", "mod"))
date variable value
2001-01-01 00:00:00 site campus
2001-01-01 01:00:00 site campus
2001-01-01 02:00:00 site campus
2001-01-01 00:00:00 obs NA
2001-01-01 01:00:00 obs 52
2001-01-01 02:00:00 obs 52
2001-01-01 00:00:00 mod 61.63
2001-01-01 01:00:00 mod 62.55
2001-01-01 02:00:00 mod 63.52
test.recast <- recast(test.data, date ~ site + obs + mod)
Error in eval(expr, envir, enclos) : object 'site' not found
最佳答案
在进行一些变量名称清理之后,您可能最好使用 base R reshape。
这是你的数据。
test <- read.table(header = TRUE, stringsAsFactors=FALSE,
text = "date site obs mod site obs mod
'2000-09-01 00:00:00' campus NA 61.63 'city centre' 66 56.69
'2000-09-01 01:00:00' campus 52 62.55 'city centre' NA 54.75
'2000-09-01 02:00:00' campus 52 63.52 'city centre' 56 54.65")
test
# date site obs mod site.1 obs.1 mod.1
# 1 2000-09-01 00:00:00 campus NA 61.63 city centre 66 56.69
# 2 2000-09-01 01:00:00 campus 52 62.55 city centre NA 54.75
# 3 2000-09-01 02:00:00 campus 52 63.52 city centre 56 54.65
Note: Both of these options generate a "time" variable which you can go ahead and drop. You might want to keep it just in case you wanted to reshape back into a wide format.
names(test)[2:4] <- paste(names(test)[2:4], "0", sep=".")
test <- reshape(test, direction = "long",
idvar = "date", varying = 2:ncol(test))
rownames(test) <- NULL # reshape makes UGLY rownames
test
# date time site obs mod
# 1 2000-09-01 00:00:00 0 campus NA 61.63
# 2 2000-09-01 01:00:00 0 campus 52 62.55
# 3 2000-09-01 02:00:00 0 campus 52 63.52
# 4 2000-09-01 00:00:00 1 city centre 66 56.69
# 5 2000-09-01 01:00:00 1 city centre NA 54.75
# 6 2000-09-01 02:00:00 1 city centre 56 54.65
rep()
很容易做到),然后使用 reshape()
如上所述。names(test)[-1] <- paste(names(test)[-1],
rep(1:((ncol(test)-1)/3), each = 3), sep = ".")
test <- reshape(test, direction = "long",
idvar = "date", varying = 2:ncol(test))
rownames(test) <- NULL
### Or, more convenient:
# names(test) <- make.unique(names(test))
# names(test)[2:4] <- paste(names(test)[2:4], "0", sep=".")
# test <- reshape(test, direction = "long",
# idvar = "date", varying = 2:ncol(test))
# rownames(test) <- NULL
require(reshape2)
melt(test, id.vars = c("date", "site", "time"))
# date site time variable value
# 1 2000-09-01 00:00:00 campus 0 obs NA
# 2 2000-09-01 01:00:00 campus 0 obs 52.00
# 3 2000-09-01 02:00:00 campus 0 obs 52.00
# 4 2000-09-01 00:00:00 city centre 1 obs 66.00
# 5 2000-09-01 01:00:00 city centre 1 obs NA
# 6 2000-09-01 02:00:00 city centre 1 obs 56.00
# 7 2000-09-01 00:00:00 campus 0 mod 61.63
# 8 2000-09-01 01:00:00 campus 0 mod 62.55
# 9 2000-09-01 02:00:00 campus 0 mod 63.52
# 10 2000-09-01 00:00:00 city centre 1 mod 56.69
# 11 2000-09-01 01:00:00 city centre 1 mod 54.75
# 12 2000-09-01 02:00:00 city centre 1 mod 54.65
reshape()
文档非常困惑。最好通过几个示例来了解它的工作原理。具体来说,“时间”不必指代时间(问题中的“日期”),而更多的是指面板数据,其中记录是在不同时间为同一 ID 收集的。在您的情况下,原始数据中唯一的“id”是“日期”列。另一个潜在的“id”是站点,但不是数据的组织方式。test1 <- structure(list(date = structure(1:3,
.Label = c("2000-09-01 00:00:00",
"2000-09-01 01:00:00", "2000-09-01 02:00:00"), class = "factor"),
obs.campus = c(NA, 52L, 52L), mod.campus = c(61.63, 62.55,
63.52), obs.cityCentre = c(66L, NA, 56L), mod.cityCentre = c(56.69,
54.75, 54.65)), .Names = c("date", "obs.campus", "mod.campus",
"obs.cityCentre", "mod.cityCentre"), class = "data.frame", row.names = c(NA,
-3L))
test1
# date obs.campus mod.campus obs.cityCentre mod.cityCentre
# 1 2000-09-01 00:00:00 NA 61.63 66 56.69
# 2 2000-09-01 01:00:00 52 62.55 NA 54.75
# 3 2000-09-01 02:00:00 52 63.52 56 54.65
reshape(test1, direction = "long", idvar = "date", varying = 2:ncol(test1))
.你会看到 reshape()
将站点名称视为“时间”(可以通过将“timevar = "site"
”添加到 reshape
命令来覆盖)。direction = "long"
,您必须指定哪些列随“时间”而变化。在您的情况下,这是除第一列之外的所有列,因此我使用 2:ncol(test)
对于“varying
”。 test2
?哪里是? melt()
应该工作。基本上,它试图让您获得“最精简”的数据形式。在这种情况下,最精简的形式将是上面描述的“可选步骤”,因为 date
+ site
将是组成唯一 ID 变量所需的最小值。 (我会说“time
”可以安全地删除。)test.melt
”,您始终可以轻松地以不同方式旋转表格。作为我的演示意思是说,试试下面的,看看他们做了什么。dcast(test.melt, date + site ~ variable)
dcast(test.melt, date ~ variable + site)
dcast(test.melt, variable + site ~ date)
dcast(test.melt, variable + date ~ site)
melt
来自“data.table”的现在可以以与
reshape
类似的方式“融化”多个列。做。无论列名是否重复,它都应该有效。
measure <- c("site", "obs", "mod")
melt(as.data.table(test), measure.vars = patterns(measure), value.name = measure)
# date variable site obs mod
# 1: 2000-09-01 00:00:00 1 campus NA 61.63
# 2: 2000-09-01 01:00:00 1 campus 52 62.55
# 3: 2000-09-01 02:00:00 1 campus 52 63.52
# 4: 2000-09-01 00:00:00 2 city centre 66 56.69
# 5: 2000-09-01 01:00:00 2 city centre NA 54.75
# 6: 2000-09-01 02:00:00 2 city centre 56 54.65
关于r - 如何用 "reoccurring"列 reshape 数据框?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12620964/
我是一名优秀的程序员,十分优秀!