- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
要跟进两个 rapply
问题,here和 here从几年前开始,rapply
似乎只适用于简单的类(即向量、矩阵),而不适用于多方面的 data.frame
类。
在大多数情况下以及下面的演示中,rapply
等价物是嵌套的 lapply
及其变体包装器,v/sapply
其中嵌套数与级别数相关。下面是我在向量、矩阵和数据帧类型之间嵌套 lapply
和 rapply
之间的测试场景。除了 datafames 之外的所有都无法均衡。
问题
在 base R 中是否有 rapply()
的用例来递归地在数据帧列表上运行操作并返回数据帧列表,就像它对向量或矩阵列表所做的那样?如果不是,这是一个错误还是应该在 ?rapply
base R 文档中发出警告?大多数教程不显示 rapply
数据框示例。
一维 (字符向量)
下面显示了 rapply
如何等同于运行字符计数的简单字符向量上的嵌套 lapply
,甚至显示了 rapply
如何明显更快处理中:
library(microbenchmark)
ScriptLists <- list(R = list.files(path="/path/to/Scripts", pattern="\\.R"),
Python = list.files(path="/path/to/Scripts", pattern="\\.py"),
SQL = list.files(path="/path/to/Scripts", pattern="\\.sql"),
PHP = list.files(path="/path/to/Scripts", pattern="\\.xsl"),
XSLT = list.files(path="/path/to/Scripts", pattern="\\.php"))
microbenchmark(
ScriptsLists1 <- lapply(ScriptLists, function(i){
unname(vapply(i, function(x){
nchar(x)
}, numeric(1)))
})
)
# Unit: microseconds
# min lq mean median uq max neval
# 384 408.782 524.1363 434.7675 678.016 886.377 100
microbenchmark(
ScriptsLists2 <- rapply(ScriptLists, function(x){
nchar(x)
}, how="list")
)
# Unit: microseconds
# min lq mean median uq max neval
# 110.196 112.8425 131.6141 114.5265 123.91 352.722 100
all.equal(ScriptsLists1, ScriptsLists2)
# [1] TRUE
二维类型 (矩阵与数据帧)
输入数据框(从最高年份排名 StackOverflow top users 中提取)以按语言标签(C#、Python、R 等)构建顶级用户数据框列表。
df <- structure(list(user = structure(c(12L, 14L, 19L, 35L, 22L, 32L,
1L, 36L, 7L, 9L, 2L, 18L, 27L, 6L, 30L, 20L, 10L, 24L, 29L, 23L,
5L, 3L, 4L, 15L, 25L, 17L, 11L, 8L, 33L, 13L, 34L, 16L, 21L,
26L, 28L, 31L), .Label = c("akrun", "alecxe", "Alexey Mezenin",
"BalusC", "Barmar", "CommonsWare", "Darin Dimitrov", "dasblinkenlight",
"Eric Duminil", "Felix Kling", "Frank van Puffelen", "Gordon Linoff",
"Greg Hewgill", "Günter Zöchbauer", "GurV", "Hans Passant", "JB Nizet",
"Jean-François Fabre", "jezrael", "Jon Skeet", "Jonathan Leffler",
"Martijn Pieters", "Martin R", "matt", "Nina Scholz", "paxdiablo",
"piRSquared", "Pranav C Balan", "Psidom", "Quentin", "Suragch",
"T.J. Crowder", "Tim Biegeleisen", "unutbu", "VonC", "Wiktor Stribi?ew"
), class = "factor"), link = structure(c(2L, 17L, 21L, 31L, 1L,
10L, 27L, 28L, 22L, 33L, 35L, 34L, 20L, 3L, 15L, 19L, 18L, 25L,
29L, 4L, 8L, 5L, 11L, 32L, 6L, 30L, 16L, 24L, 13L, 36L, 14L,
12L, 9L, 7L, 23L, 26L), .Label = c("http://www.stackoverflow.com//users/100297/martijn-pieters",
"http://www.stackoverflow.com//users/1144035/gordon-linoff",
"http://www.stackoverflow.com//users/115145/commonsware", "http://www.stackoverflow.com//users/1187415/martin-r",
"http://www.stackoverflow.com//users/1227923/alexey-mezenin",
"http://www.stackoverflow.com//users/1447675/nina-scholz", "http://www.stackoverflow.com//users/14860/paxdiablo",
"http://www.stackoverflow.com//users/1491895/barmar", "http://www.stackoverflow.com//users/15168/jonathan-leffler",
"http://www.stackoverflow.com//users/157247/t-j-crowder", "http://www.stackoverflow.com//users/157882/balusc",
"http://www.stackoverflow.com//users/17034/hans-passant", "http://www.stackoverflow.com//users/1863229/tim-biegeleisen",
"http://www.stackoverflow.com//users/190597/unutbu", "http://www.stackoverflow.com//users/19068/quentin",
"http://www.stackoverflow.com//users/209103/frank-van-puffelen",
"http://www.stackoverflow.com//users/217408/g%c3%bcnter-z%c3%b6chbauer",
"http://www.stackoverflow.com//users/218196/felix-kling", "http://www.stackoverflow.com//users/22656/jon-skeet",
"http://www.stackoverflow.com//users/2336654/pirsquared", "http://www.stackoverflow.com//users/2901002/jezrael",
"http://www.stackoverflow.com//users/29407/darin-dimitrov", "http://www.stackoverflow.com//users/3037257/pranav-c-balan",
"http://www.stackoverflow.com//users/335858/dasblinkenlight",
"http://www.stackoverflow.com//users/341994/matt", "http://www.stackoverflow.com//users/3681880/suragch",
"http://www.stackoverflow.com//users/3732271/akrun", "http://www.stackoverflow.com//users/3832970/wiktor-stribi%c5%bcew",
"http://www.stackoverflow.com//users/4983450/psidom", "http://www.stackoverflow.com//users/571407/jb-nizet",
"http://www.stackoverflow.com//users/6309/vonc", "http://www.stackoverflow.com//users/6348498/gurv",
"http://www.stackoverflow.com//users/6419007/eric-duminil", "http://www.stackoverflow.com//users/6451573/jean-fran%c3%a7ois-fabre",
"http://www.stackoverflow.com//users/771848/alecxe", "http://www.stackoverflow.com//users/893/greg-hewgill"
), class = "factor"), location = structure(c(17L, 15L, 8L, 12L,
10L, 26L, 1L, 28L, 23L, 1L, 17L, 25L, 6L, 29L, 26L, 19L, 24L,
1L, 5L, 13L, 4L, 2L, 3L, 1L, 7L, 20L, 21L, 27L, 22L, 11L, 1L,
16L, 9L, 1L, 18L, 14L), .Label = c("", "??????", "Amsterdam, Netherlands",
"Arlington, MA", "Atlanta, GA, United States", "Bellevue, WA, United States",
"Berlin, Deutschland", "Bratislava, Slovakia", "California, USA",
"Cambridge, United Kingdom", "Christchurch, New Zealand", "France",
"Germany", "Hohhot, China", "Linz, Austria", "Madison, WI", "New York, United States",
"Ramanthali, Kannur, Kerala, India", "Reading, United Kingdom",
"Saint-Etienne, France", "San Francisco, CA", "Singapore", "Sofia, Bulgaria",
"Sunnyvale, CA", "Toulouse, France", "United Kingdom", "United States",
"Warsaw, Poland", "Who Wants to Know?"), class = "factor"), year_rep = structure(c(36L,
35L, 34L, 33L, 32L, 31L, 30L, 29L, 28L, 27L, 26L, 25L, 24L, 23L,
22L, 21L, 20L, 19L, 18L, 17L, 16L, 15L, 14L, 13L, 12L, 11L, 10L,
9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L), .Label = c("3,580", "3,604",
"3,636", "3,649", "3,688", "3,735", "3,796", "3,814", "3,886",
"3,920", "3,923", "3,950", "4,016", "4,046", "4,142", "4,179",
"4,195", "4,236", "4,313", "4,324", "4,348", "4,464", "4,475",
"4,482", "4,526", "4,723", "4,854", "4,936", "4,948", "5,188",
"5,258", "5,337", "5,577", "5,740", "5,835", "5,985"), class = "factor"),
total_rep = structure(c(18L, 2L, 34L, 27L, 22L, 20L, 5L,
3L, 31L, 1L, 6L, 9L, 13L, 25L, 21L, 36L, 14L, 4L, 11L, 7L,
8L, 10L, 30L, 29L, 24L, 15L, 35L, 17L, 33L, 23L, 12L, 28L,
16L, 19L, 26L, 32L), .Label = c("12,557", "154,439", "158,134",
"220,515", "229,553", "233,368", "269,380", "289,989", "30,027",
"31,602", "36,950", "401,595", "41,183", "411,535", "418,780",
"455,157", "475,813", "499,408", "507,043", "508,310", "509,365",
"525,176", "529,137", "61,135", "616,135", "64,476", "651,397",
"672,118", "7,932", "703,046", "709,683", "71,032", "77,211",
"83,237", "86,520", "921,690"), class = "factor"), tag1 = structure(c(15L,
2L, 10L, 6L, 11L, 8L, 12L, 13L, 4L, 14L, 11L, 11L, 10L, 1L,
8L, 4L, 8L, 16L, 11L, 16L, 8L, 9L, 7L, 15L, 8L, 7L, 5L, 4L,
15L, 6L, 11L, 4L, 3L, 3L, 8L, 16L), .Label = c("android",
"angular2", "c", "c#", "firebase", "git", "java", "javascript",
"laravel", "pandas", "python", "r", "regex", "ruby", "sql",
"swift"), class = "factor"), tag2 = structure(c(23L, 24L,
19L, 8L, 20L, 14L, 6L, 13L, 3L, 21L, 22L, 20L, 19L, 12L,
10L, 12L, 14L, 11L, 17L, 11L, 18L, 18L, 15L, 16L, 2L, 9L,
7L, 12L, 16L, 19L, 17L, 1L, 4L, 5L, 14L, 11L), .Label = c(".net",
"arrays", "asp.net-mvc", "bash", "c++", "dplyr", "firebase-database",
"github", "hibernate", "html", "ios", "java", "javascript",
"jquery", "jsf", "mysql", "pandas", "php", "python", "python-3.x",
"ruby-on-rails", "selenium", "sql-server", "typescript"), class = "factor"),
tag3 = structure(c(20L, 17L, 11L, 12L, 24L, 15L, 11L, 8L,
5L, 4L, 23L, 24L, 11L, 3L, 10L, 1L, 6L, 31L, 25L, 28L, 18L,
19L, 26L, 27L, 22L, 16L, 2L, 9L, 15L, 13L, 21L, 30L, 29L,
7L, 14L, 2L), .Label = c(".net", "android", "android-intent",
"arrays", "asp.net-mvc-3", "asynchronous", "bash", "c#",
"c++", "css", "dataframe", "docker", "git-pull", "html",
"java", "java-8", "javascript", "jquery", "laravel-5.3",
"mysql", "numpy", "object", "protractor", "python-2.7", "r",
"servlets", "sql-server", "swift3", "unix", "winforms", "xcode"
), class = "factor")), .Names = c("user", "link", "location",
"year_rep", "total_rep", "tag1", "tag2", "tag3"), class = "data.frame", row.names = c(NA,
-36L))
R代码
以下方法对类型、矩阵或数据帧中的 year_rep 和 total_rep(第 5/6)列进行平均。请务必更改设置 block 中的返回语句,换出注释部分类型。请注意矩阵返回的 rapply()
与嵌套 lapply
相同,但数据帧返回则不然。
# NESTED LIST SETUP ------------------------------------
LangLists <- list(`c#`=list(), python=list(), sql=list(), php=list(), r=list(),
java=list(), javascript=list(), ruby=list(), `c++`=list())
LangLists <- setNames(mapply(function(i, j){
df <- subset(df, tag1 == j | tag2 == j | tag3 == j)
df$year_rep <- as.numeric(as.character(gsub(",", "", df$year_rep)))
df$total_rep <- as.numeric(as.character(gsub(",", "", df$total_rep)))
return(list(as.matrix(df))) # MATRIX TYPE
# return(list(df)) # DF TYPE
}, LangLists, names(LangLists), SIMPLIFY=FALSE), names(LangLists))
# -----------------------------------------------------
# MATRIX RETURN
LangLists1 <- lapply(LangLists, function(i){
lapply(i, function(df){
cbind(mean(as.numeric(df[,5])),
mean(as.numeric(df[,6])))
})
})
LangLists2 <- rapply(LangLists, function(i){
cbind(mean(as.numeric(i[,5])),
mean(as.numeric(i[,6])))
}, classes="matrix", how="list")
all.equal(LangLists1, LangLists2)
# [1] TRUE
# DATA FRAME RETURN
LangLists1 <- lapply(LangLists, function(i){
lapply(i, function(df){
data.frame(year_rep=mean(df$year_rep),
total_rep=mean(df$total_rep))
})
})
LangLists2 <- rapply(LangLists, function(i){
data.frame(year_rep=mean(i$year_rep),
total_rep=mean(i$total_rep))
}, classes="data.frame", how="list")
all.equal(LangLists1, LangLists2)
# [1] "Component “c#”: Component 1: Names: 2 string mismatches"
# [2] "Component “c#”: Component 1: Attributes: < names for target but not for current >"
# [3] "Component “c#”: Component 1: Attributes: < Length mismatch: comparison on first 0 components >"
# [4] "Component “c#”: Component 1: Length mismatch: comparison on first 2 components"
# [5] "Component “c#”: Component 1: Component 1: Modes: numeric, NULL"
...
事实上,虽然嵌套的 lapply
仍然是 rep 的两列完整数据帧列表,但 rapply
用于数据帧转换基础数据帧到 NULL 列表。那么,与向量/矩阵相比,为什么 rapply 无法返回原始数据帧列表?
# $`c#`
# $`c#`[[1]]
# $`c#`[[1]]$X
# NULL
# $`c#`[[1]]$user
# NULL
# $`c#`[[1]]$link
# NULL
# $`c#`[[1]]$location
# NULL
# $`c#`[[1]]$year_rep
# NULL
# $`c#`[[1]]$total_rep
# NULL
# $`c#`[[1]]$tag1
# NULL
# $`c#`[[1]]$tag2
# NULL
# $`c#`[[1]]$tag3
# NULL
# $python
# $python[[1]]
# $python[[1]]$X
# NULL
# $python[[1]]$user
# NULL
# $python[[1]]$link
# NULL
# $python[[1]]$location
# NULL
# $python[[1]]$year_rep
# NULL
# $python[[1]]$total_rep
# NULL
# $python[[1]]$tag1
# NULL
# $python[[1]]$tag2
# NULL
# $python[[1]]$tag3
# NULL
最佳答案
rapply
似乎不是为处理数据帧列表而设计的。
从 ?rapply
的详细信息部分说,如果
how = "list" or how = "unlist", the list is copied, all non-list elements which have a class included in classes are replaced by the result of applying f to the element and all others are replaced by deflt.
由于 data.frames 是列表,因此它们不属于第一类。因此,它们属于 all others catch-all 并被 dflt 替换,其默认值为 NULL。这解释了问题中最后一行代码的结果。
关于如何“替换”的最后一个替代参数似乎在这种“模式”下 data.frames 被简单地忽略了
If how = "replace", each element of the list which is not itself a list and has a class included in classes is replaced by the result of applying f to the element.
没有提及本身就是列表的元素,使用 how="replace"运行上面的代码似乎返回一个嵌套列表,其中 data.frames 现在是简单列表。所以看来 rapply
已经通过并剥离了类属性。
关于在数据帧列表上运行 rapply,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41813353/
要跟进两个 rapply 问题,here和 here从几年前开始,rapply 似乎只适用于简单的类(即向量、矩阵),而不适用于多方面的 data.frame 类。 在大多数情况下以及下面的演示中,r
我可以使用rapply深入了解列表(其结构未知;可能嵌套也可能不嵌套)以提取一些信息吗?例如,我有一个模型对象列表,其中一个元素也是模型对象列表。我可以同时运行一个函数吗? df getcoef
我有一个嵌套列表,其基本元素是数据帧,我想递归地遍历这个列表来对每个数据帧进行一些计算,最后得到一个与输入结构相同的嵌套结果列表。我知道“rapply”正是针对此类任务,但我遇到了一个问题,rappl
似乎无法将 Not a Number NaN 错误替换为 0。尝试使用 rapply 但对我不起作用。 window.onload = function () { var first = do
我知道列表中的 NULL 值有时会 trip人起来。我很好奇为什么在特定情况下 lapply和 rapply好像对待NULL值(value)观不同。 l lapply(NULL, identity)
我是一名优秀的程序员,十分优秀!