gpt4 book ai didi

r - 向下填充带有 NA 的列的行(使用 R base 或 data.table)

转载 作者:行者123 更新时间:2023-12-04 22:32:09 24 4
gpt4 key购买 nike

我想使用人口普查的 county-adjacency 数据,但我一直坚持把它变成一个很好的形式。数据分为四列:第一个县、第一个代码、第二个县、第二个代码。第一个县列没有重复,而是按照我现在读入的方式取值“”:

                     c1   cd1                    c2   cd2
1 Alamance County, NC 37001 Alamance County, NC 37001
2 NA Caswell County, NC 37033
3 NA Chatham County, NC 37037
4 NA Guilford County, NC 37081
5 NA Orange County, NC 37135
6 NA Randolph County, NC 37151
7 NA Rockingham County, NC 37157
8 Alexander County, NC 37003 Alexander County, NC 37003
9 NA Caldwell County, NC 37027
10 NA Catawba County, NC 37035
11 NA Iredell County, NC 37097
12 NA Wilkes County, NC 37193
13 Alleghany County, NC 37005 Alleghany County, NC 37005
14 NA Ashe County, NC 37009
15 NA Surry County, NC 37171
16 NA Wilkes County, NC 37193
17 NA Grayson County, VA 51077
18 Anson County, NC 37007 Anson County, NC 37007
19 NA Montgomery County, NC 37123
20 NA Richmond County, NC 37153

我碰巧只对在该链接中找到的北卡罗来纳州部分数据感兴趣,其中一部分是您在上面看到的:
#
nc_cc <- structure(list(c1 = c("Alamance County, NC", "", "", "", "", "", "", "Alexander County, NC", "", "", "", "", "Alleghany County, NC", "", "", "", "", "Anson County, NC", "", ""), cd1 = c(37001L, NA, NA, NA, NA, NA, NA, 37003L, NA, NA, NA, NA, 37005L, NA, NA, NA, NA, 37007L, NA, NA), c2 = c("Alamance County, NC", "Caswell County, NC", "Chatham County, NC", "Guilford County, NC", "Orange County, NC", "Randolph County, NC", "Rockingham County, NC", "Alexander County, NC", "Caldwell County, NC", "Catawba County, NC", "Iredell County, NC", "Wilkes County, NC", "Alleghany County, NC", "Ashe County, NC", "Surry County, NC", "Wilkes County, NC", "Grayson County, VA", "Anson County, NC", "Montgomery County, NC", "Richmond County, NC" ), cd2 = c(37001L, 37033L, 37037L, 37081L, 37135L, 37151L, 37157L, 37003L, 37027L, 37035L, 37097L, 37193L, 37005L, 37009L, 37171L, 37193L, 51077L, 37007L, 37123L, 37153L)), .Names = c("c1", "cd1", "c2", "cd2"), row.names = c(NA, 20L), class = "data.frame")
#

我想要一个干净的邻接关联(县名是多余的),所以我想要的输出可以采用多种形式:data.frame,列表,......

我想出的粗略解决方案(经过深思熟虑)是这样的:
require(data.table)
DT <- data.table(nc_cc)
DT[,list(cd1=cd1[1],cd2),by=cumsum(!is.na(cd1))][,list(cd1,cd2)]

给予
      cd1   cd2
1: 37001 37001
2: 37001 37033
3: 37001 37037
4: 37001 37081
5: 37001 37135
6: 37001 37151
7: 37001 37157
8: 37003 37003
9: 37003 37027
10: 37003 37035
11: 37003 37097
12: 37003 37193
13: 37005 37005
14: 37005 37009
15: 37005 37171
16: 37005 37193
17: 37005 51077
18: 37007 37007
19: 37007 37123
20: 37007 37153

我用 data.table 标记了它,因为我在上面的解决方案中使用了它,我怀疑可以用 roll 做一些不错的事情。真的,我从来没有理解 roll 的文档,所以我希望能在这里学到一些东西......所以:这可以做得更好吗?

编辑: This question 也在问同样的事情,所以我修改了我的问题:“有没有更好的方法使用 data.table 或 base R 来做到这一点(因为我不愿意安装更多的包)?”

最佳答案

一个非常标准的方法是:

library(data.table)
dt = data.table(nc_cc)

dt[, cd1 := cd1[1], by = cumsum(!is.na(cd1))]

关于r - 向下填充带有 NA 的列的行(使用 R base 或 data.table),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18840628/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com