r - 使用 dplyr - R 检查组中的字符是否全部相等-6ren

r - 使用 dplyr - R 检查组中的字符是否全部相等

转载作者：行者123 更新时间：2023-12-04 09:41:58

在以下数据框中，如何按前两列分组并检查第四列中的所有值是否相同？如果它们相同，我想用 '' 替换它们。

在这个例子中，组组合 'embryonated + protein' 和 'Hatching + Lipid' 是仅有的两个字母不都是 a 的组。

df

         Stage variable Temperature letters       Mean
30 Embryonated Moisture          30       a  808.70882
31 Embryonated      NFE          20       a   53.28806
32 Embryonated      NFE          25       a   45.38572
33 Embryonated      NFE          30       a   84.56113
34 Embryonated  Protein          20      ab  118.53608
35 Embryonated  Protein          25       b  127.29849
36 Embryonated  Protein          30       a   84.55175
37    Hatching      Ash          20       a   16.95345
38    Hatching      Ash          25       a   14.54980
39    Hatching      Ash          30       a   13.38510
40    Hatching   Energy          20       a 4931.18857
41    Hatching   Energy          25       a 4187.27213
42    Hatching   Energy          30       a 4314.61171
43    Hatching    Lipid          20       b   26.44363
44    Hatching    Lipid          25       a   19.90928
45    Hatching    Lipid          30      ab   22.27561
46    Hatching Moisture          20       a  785.63062
47    Hatching Moisture          25       a  818.69860
48    Hatching Moisture          30       a  815.32070
49    Hatching      NFE          20       a   60.34359
50    Hatching      NFE          25       a   43.02979

我曾尝试使用 dplyr 无济于事。

grp_cols <- names(df)[c(1,2)] #group by stage and variable

# Convert character vector to list of symbols
dots <- lapply(grp_cols3, as.symbol)


res = df %>% group_by(.dots=dots) %>% 
  do(k=all(letters=='a')) #(returns all groups as `FALSE`)

数据:

dput(df)

structure(list(Stage = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Developing", 
"Embryonated", "Hatching", "Laid"), class = "factor"), variable = structure(c(1L, 
5L, 5L, 5L, 2L, 2L, 2L, 4L, 4L, 4L, 6L, 6L, 6L, 3L, 3L, 3L, 1L, 
1L, 1L, 5L, 5L), .Label = c("Moisture", "Protein", "Lipid", "Ash", 
"NFE", "Energy"), class = "factor"), Temperature = c("30", "20", 
"25", "30", "20", "25", "30", "20", "25", "30", "20", "25", "30", 
"20", "25", "30", "20", "25", "30", "20", "25"), letters = c("a", 
"a", "a", "a", "ab", "b", "a", "a", "a", "a", "a", "a", "a", 
"b", "a", "ab", "a", "a", "a", "a", "a"), Mean = c(808.708818349727, 
53.2880626188374, 45.3857220182952, 84.5611267892406, 118.536080769588, 
127.298486932385, 84.5517498179938, 16.9534468121571, 14.5497954869813, 
13.3850951354759, 4931.18857123979, 4187.27213494545, 4314.61171127083, 
26.4436265667305, 19.9092762683653, 22.2756088142943, 785.630624024365, 
818.698598619779, 815.320702070777, 60.3435858953567, 43.0297881562102
)), .Names = c("Stage", "variable", "Temperature", "letters", 
"Mean"), row.names = 30:50, class = "data.frame")

最佳答案

按每组拆分数据，查找 n_distinct 值，然后在这种情况下替换为 '':

df %>%
  group_by(Stage,variable) %>%
  mutate(letters = replace(letters, n_distinct(letters)==1, '') )

类似的逻辑也适用于 data.table:

library(data.table)
setDT(df)
df[, letters := if(uniqueN(letters)==1) '' else letters, by=.(Stage,variable)]

关于r - 使用 dplyr - R 检查组中的字符是否全部相等，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50167050/

文章推荐： Symfony2/Monolog : Log Level - only show app. 信息？

文章推荐： html - 我在悬停时更改按钮背景颜色的 CSS 不起作用

文章推荐： ruby-on-rails - 如何在 ruby on rails 迁移中设置主键？

r - 检查组 R 中的任何前一行是否满足条件
假设我在 R 中有这个数据框: x % group_by(group) %>% arrange(order, .by_group=T) 但我不知道从那里去哪里，因为 lag() 和 lead() 仅适
linux - phpmyadmin, neginx error.log - 检查组 www-data 有读取权限和 open_basedir
我在 phpmyadmin 网站上有这条消息 The phpMyAdmin configuration storage is not completely configured, some exten

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - 使用 dplyr - R 检查组中的字符是否全部相等