Filtering Rows in R Data Frame Based on Multiple Conditions(基于多条件过滤R数据框中的行)-6ren

Filtering Rows in R Data Frame Based on Multiple Conditions(基于多条件过滤R数据框中的行)

转载作者：bug小助手更新时间：2023-10-27 20:42:17

I have a data frame in R with three columns: Gene.ID, source, and value. I need to filter the rows based on multiple conditions, but I'm having trouble achieving the desired result. Here's a sample of my data:
My goal is to:

我在R中有一个数据框，它有三列：Gene.ID、SOURCE和VALUE。我需要基于多个条件来筛选行，但我很难获得所需的结果。以下是我的数据样本：我的目标是：

Keep rows with the same Gene.ID and source.
For rows with the same Gene.ID but different source, I want to keep them only if the value is different from the previous row.
I've tried various approaches using dplyr and custom loops, but I haven't been able to achieve the desired filtering logic.

保留具有相同Gene.ID和源的行。对于具有相同Gene.ID但来源不同的行，我希望仅当值与前一行不同时才保留它们。我尝试了使用dplyr和定制循环的各种方法，但未能实现所需的过滤逻辑。

Can someone provide a solution or suggest an efficient way to filter this data frame based on these conditions?

是否有人可以提供解决方案或建议一种有效的方法来根据这些条件过滤此数据框？

Thank you for your assistance!

感谢您的帮助！

df <- data.frame(
  Gene.ID = c(
    "NZ_JAHWGH010000001.1_15",
    "NZ_JAHWGH010000001.1_17",
    "NZ_JAHWGH010000001.1_68",
    "NZ_JAHWGH010000001.1_7"
  ),
  HMMER = c(
    "SLH",
    "GT2",
    "GT2",
    "GH13+CBM41+CBM41+GH13"
  ),
  dbCAN_sub = c(
    "",
    "GT2",
    "GT2",
    "CBM41+GH13+CBM41+CBM41+CBM48+GH13"
  ),
  DIAMOND = c(
    "",
    "",
    "GT2",
    "CBM41+CBM48+GH13+GH13+GH11"
  ),
  stringsAsFactors = FALSE
)

my desired output will be as followes

我想要的输出如下

df_output <- data.frame(
  Gene.ID = c(
    "NZ_JAHWGH010000001.1_15",
    "NZ_JAHWGH010000001.1_17",
    "NZ_JAHWGH010000001.1_68",
    "NZ_JAHWGH010000001.1_7",
    "NZ_JAHWGH010000001.1_7",
    "NZ_JAHWGH010000001.1_7",
    "NZ_JAHWGH010000001.1_7",
    "NZ_JAHWGH010000001.1_7",
    "NZ_JAHWGH010000001.1_7",
    "NZ_JAHWGH010000001.1_7"
  ),
  combined = c(
    "SLH",
    "GT2",
    "GT2",
    "CBM41",
    "GH13",
    "CBM41",
    "CBM41",
    "CBM48"
    "GH13",
    "GH11"
  ),
  stringsAsFactors = FALSE
)

I tried with this command But I didnt get, desired output

我尝试使用此命令，但没有得到所需的输出

df_output <- df %>%
  separate_rows(., sep = "\\+") %>%
  gather(key = "source", value = " ", -Gene.ID) %>%
  filter(combined != "") %>%
  distinct(Gene.ID, combined)

更多回答

(1) Is separate_rows actually doing anything for you? For me, it's doing nothing, and will likely have problems if you use it correctly since the fields to be separated have different lengths of values to separate. (2) tidyr::gather was superseded years ago, I strongly suggest you learn to use tidyr::pivot_longer, it is far more powerful. (3) Perhaps you should pivot_longer (or gather) and then separate. (4) Why create a column with a name of " "? That seems like making future operations on it rather difficult.

(1)独立行实际上为您做了什么吗？对我来说，它什么都不做，如果您正确使用它，可能会有问题，因为要分隔的字段有不同长度的值要分隔。(2)tidyr：：Gather几年前就被取代了，我强烈建议您学习使用tidyr：：Pivot_Long，它的功能要强大得多。(3)也许你应该更长时间地旋转(或聚集)，然后分开。(4)为什么要创建一个名称为“”的列？这似乎使未来对其进行操作变得相当困难。

(5) "previous row" is really fragile, since I don't know that reshaping is going to guarantee everything is in the order you expect.

(5)“前一排”真的很脆弱，因为我不知道重塑就能保证一切都按你预期的顺序进行。

Lastly, it might clear up a lot of questions if you included what the output should be here. (Providing it as a frame is incredibly helpful, even if you have to create it manually.)

最后，如果您在这里包含了输出应该是什么，可能会澄清很多问题。(将其作为框架提供是非常有用的，即使您必须手动创建它。)

优秀答案推荐

A working version following your approach replacing some of the tidyr functions with more recent versions:

遵循您的方法的工作版本用更新的版本替换了一些tidyr函数：

library(dplyr)
library(tidyr)

df %>%
  tidyr::separate_longer_delim(HMMER, delim = "+") %>%
  tidyr::separate_longer_delim(dbCAN_sub, delim = "+") %>%
  tidyr::separate_longer_delim(DIAMOND, delim = "+")%>%
  tidyr::pivot_longer(-Gene.ID, values_to = "combined") %>%
  dplyr::select(Gene.ID, combined) %>%
  dplyr::filter(combined != "") %>%
  dplyr::distinct()


# A tibble: 7 x 2
  Gene.ID                 combined
  <chr>                   <chr>   
1 NZ_JAHWGH010000001.1_15 SLH     
2 NZ_JAHWGH010000001.1_17 GT2     
3 NZ_JAHWGH010000001.1_68 GT2     
4 NZ_JAHWGH010000001.1_7  GH13    
5 NZ_JAHWGH010000001.1_7  CBM41   
6 NZ_JAHWGH010000001.1_7  CBM48   
7 NZ_JAHWGH010000001.1_7  GH11

Note that this might create very long intermediate dataframes if run with a larger initial dataframe, as the separate_longer steps create many rows with redundant content which are only dropped again at the end (in this example 4 -> 359 -> 7 rows). There is probably a more efficient way to do this for large dataset.

请注意，如果使用较大的初始数据帧运行，这可能会创建非常长的中间数据帧，因为Separate_Long步骤创建了许多具有冗余内容的行，这些内容只会在结束时再次删除(在本例中为4->359->7行)。对于大型数据集，可能有一种更有效的方法来实现这一点。

更多回答

@Umar I answered to your provided code example and the expected output, but I've realized that the title and first paragraph of the question seem to describe a different data structure / question. If the answer is off topic I can delete it.

@Umar我回答了您提供的代码示例和预期输出，但我意识到问题的标题和第一段似乎描述了不同的数据结构/问题。如果答案偏离了主题，我可以删除它。

Thank you, but , it didnt meet the answer, I did it using loop, i was busy with my manuscript, after this, so I couldnt see your reply

谢谢，但是，它不符合答案，我是用循环做的，我忙着写稿子，在这之后，我看不到你的回复

文章推荐： What is "

" in HTML?(在HTML中“

”是什么？)

javascript - Angular 1 : multiple conditions with multiple conditions OR how to exclude conditions if other conditions are true
现在我已经创建了一个额外的跨度来容纳一个条件。 568 || subKey == 0" ng-repeat="links in linksWrap.links">
Excel公式: If condition then do that condition
一些 excel IF 语句可能会变得相当长，我正在寻找一种更简单的方法来编写它们。例如，如果我要写: If($B$4+13=7,$B$4+13,FALSE) 我认为它会更容易说: If($B$4+1
php - 如何编写多个条件为 true 的 php If 语句(Condition#1=true、Condition#2=true、Condition#3=true)
我有一个包含 FromDate 、 ToDate 、 VendorName 和 GoodsName 的表单，一旦一切为真，我需要显示结果示例: FromDate="11/20/2019"、ToDat
javascript - if(!!condition) 和 if(condition) 有什么区别
我经常看到使用 !!condition 而不仅仅是常规条件的代码。即: if(!!value){ doSomething(); } 对比: if(value){ doSomething
java - if(condition) else or if(condition)，使用break时性能有区别吗？
这个问题有点模棱两可，这两个在汇编代码/性能方面是否等效: public void example{ do{ //some statements; if(condition)
c# - Where(condition).Any() 和 Any(condition) 是否等价
在我看到的使用 Any 方法的 Linq 查询示例中，大约有一半是通过将其应用于 Where() 调用的结果来实现的，另一半则直接将其应用于集合。这两种样式是否总是等效的，或者在某些情况下它们可能会返
c - 为什么使用 !!(condition) 而不是 (condition)？
这个问题在这里已经有了答案: What does !!(x) mean in C (esp. the Linux kernel)? (3 个答案) 关闭 9 年前。我见过人们使用带有两个 '!'
java - 线程转储 : How to see the condition of waiting/or any other condition?
我对部署在生产环境中的应用程序进行了线程转储，该应用程序使用 logback。我不是分析线程转储的专家，但是，我必须这样做。正在学习，网上也看了一些文章。下面是真正的线程转储: "logback-8
SQL: "condition is not true"模式替代 "is null or not (condition)"
在 SQL 中(特别是 Postgres): 子句 where not foo='bar' in case foo is null 评估为某种 null，导致该行不是包含在结果中。另一方面，子句 w
mysql - Condition with join 类似于 where condition after join
是不是类似于has and condition with join和where condition after join？例如对于以下两个查询，它会给我相同的结果吗 1) SELECT COUNT
c++ - 为什么 { } while(condition);末尾需要分号但 while(condition) {} 不需要？
按照目前的情况，这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持，但这个问题可能会引发辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the
c - 样式问题 !condition agains condition == NULL
如果您调用某个函数，并且该函数在发生错误时返回 NULL(例如，想想 malloc() 或 fopen())，两个更好: FILE *fp = fopen(argv[0], "r"); if (fp
Azure 数据工厂 V2 - If Condition 事件不能包含另一个 If Condition 事件
我正在使用 Azure 数据工厂 V2，我需要在父检查验证中实现两级检查。例如:如果条件一为真，那么我需要检查条件 2。并且，如果条件 2 为真，则检查条件 3。这是一种分层检查。当我在父 IF 条
linq-to-entities - .Where().FirstOrDefault() vs .FirstOrDefault()
使用 Linq to Entities 有以下区别吗？ db.EntityName.Where(a => a.Id == id).FirstOrDefault(); db.EntityName.Fir
sql - WHERE 子句中的 "Conditional Conditions"(应用哪个条件取决于 "mode"标志)
我有一种情况，我已经用两种不同的方式解决了，但想知道人们对这些选项的看法，以及他们是否有其他选择...... 系统正在处理数据的“间隔”。所有数据都分配到一个“区间” 该间隔由事实表中的“inte
powerbi - 电源 BI : Multiple condition in single if condition
我有包含字段 Amount, Condition1, Condition2 的表格。例子: Amount Condition1 Condition2 ---------------------
java - condition in jsp executes all conditions
我正在尝试在 Netbeans 中制作一个简单的 MySQL、Java JDBC Web 应用程序。我希望根据当前 session 中的状态变量显示不同的内容。我尝试了以下方法: 首先，我在 .jsp
conditional-statements - smarty tags和css condition tags一样，请问如何解决？
我想为 postnuke cms 设计一个主题。并希望在模板文件中使用 css 条件。 postnuke 使用类似 smarty 的标签 .... 所以当我使用 .... 它给出了一些关于标签的错误
python - asyncio.Condition 中的锁除了兼容 threading.Condition 之外还有其他用途吗？
我想问一下asyncio.Condition .我对这个概念并不熟悉，但我从学生时代就知道并了解锁、信号量和队列。我找不到很好的解释或典型的用例，只是 this example .我看了看来源。核心
mysql - SQL : Conditional result used in the same conditional outputs
我想知道如何在不在语句中重做相同查询两次的情况下处理 SQL 比较。这是我要找的: SELECT columnName10, IF( SELECT columnName20 FROM Othe

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Filtering Rows in R Data Frame Based on Multiple Conditions(基于多条件过滤R数据框中的行)