gpt4 book ai didi

r - 从嵌套条目创建数据框

转载 作者:行者123 更新时间:2023-12-04 08:43:24 24 4
gpt4 key购买 nike

我有一个数据框 test像这样:

dput(test)
structure(list(X = 1L, entityId = structure(1L, .Label = "HOST-123", class = "factor"),
displayName = structure(1L, .Label = "server1", class = "factor"),
discoveredName = structure(1L, .Label = "server1", class = "factor"),
firstSeenTimestamp = 1593860000000, lastSeenTimestamp = 1603210000000,
tags = structure(1L, .Label = "c(\"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\"), c(\"app1\", \"client\", \"org\", \"app1\", \"DATA_CENTER\", \"PURPOSE\", \"REGION\", \"Test\"), c(NA, \"NONE\", \"Host:Environment:test123\", \"111\", \"222\", \"GENERAL\", \"444\", \"555\")", class = "factor")), .Names = c("X",
"entityId", "displayName", "discoveredName", "firstSeenTimestamp",
"lastSeenTimestamp", "tags"), class = "data.frame", row.names = c(NA,
-1L))
有一列叫 tags这应该成为一个数据框。我需要去掉标签中的第一行(一直说 CONTEXTLESS,展开标签中的第二列(使它们成为列。最后,我需要在每个展开列下的标签中插入第三列值。
例如需要看起来像这样:
structure(list(entityId = structure(1L, .Label = "HOST-123", class = "factor"), 
displayName = structure(1L, .Label = "server1", class = "factor"),
discoveredName = structure(1L, .Label = "server1", class = "factor"),
firstSeenTimestamp = 1593860000000, lastSeenTimestamp = 1603210000000,
app1 = NA, client = structure(1L, .Label = "None", class = "factor"),
org = structure(1L, .Label = "Host:Environment:test123", class = "factor"),
app1.1 = 111L, data_center = 222L, purppose = structure(1L, .Label = "general", class = "factor"),
region = 444L, test = 555L), .Names = c("entityId", "displayName",
"discoveredName", "firstSeenTimestamp", "lastSeenTimestamp",
"app1", "client", "org", "app1.1", "data_center", "purppose",
"region", "test"), class = "data.frame", row.names = c(NA, -1L
))
我需要删除一直说“无上下文”的第一个向量,将第二个向量添加到列中。每个第二个向量值应该是一个列名。最后一个向量应该是新添加的列的值。

最佳答案

如果您愿意丢弃第一“行”垃圾,然后对解析副作用进行一些清理,那么这可能是一个不错的起点:

read.table(text=gsub("\\),", ")\n", test$tags[1]), sep=",", skip=1, #drops line
header=TRUE)

c.app1 client org app1 DATA_CENTER PURPOSE REGION Test.
1 c(NA NONE Host:Environment:test123 111 222 GENERAL 444 555)
read.table函数使用 scan不知道“c(”和“)”有意义的函数。另一种选择可能是尝试 eval(parse(text= .)) (它们会知道它们包含向量)在第二行和第三行,但我看不到一种干净的方法来做到这一点。我最初尝试使用 strsplit 分隔行,但这导致我失去了括号。
这是通过添加更多 gsub 操作进行的一些清理工作:
read.table(text=gsub("c\\(|\\)","", # gets rid of enclosing "c(" and ")"
gsub("\\),", "\n", # inserts line breaks
test$tags[1])),
sep=",", #lets commas be parsed
skip=1, #drops line
header=TRUE) # converts to colnames

app1 client org app1.1 DATA_CENTER PURPOSE REGION Test
1 NA NONE Host:Environment:test123 111 222 GENERAL 444 555
在 app1 的第二个实例中添加“.1”的原因是数据帧中的 R 列名需要是唯一的,除非您使用 check.names=FALSE 覆盖它。

关于r - 从嵌套条目创建数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64450638/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com