gpt4 book ai didi

python - Pandas数据框如何按值分组,按降序值排序,然后过滤到分位数(0.1)

转载 作者:太空宇宙 更新时间:2023-11-03 21:37:07 24 4
gpt4 key购买 nike

我有一个数据框(p4p5_merge),当前如下所示:

    SampleID      expr             Gene  Period                     tag  \
1 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996
2 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996
3 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996
4 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996
5 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
6 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
7 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
8 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
9 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
10 HSB195 0.214731 ENSG00000188157 4 HSB195|ENSG00000188157
11 HSB195 0.214731 ENSG00000188157 4 HSB195|ENSG00000188157
12 HSB195 0.214731 ENSG00000188157 4 HSB195|ENSG00000188157
14 HSB152 5.062444 ENSG00000188157 4 HSB152|ENSG00000188157
15 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
16 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
17 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
18 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
19 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
20 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
21 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
22 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
23 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749

Consequence
1 upstream_gene_variant
2 upstream_gene_variant
3 upstream_gene_variant
4 upstream_gene_variant
5 upstream_gene_variant
6 upstream_gene_variant
7 upstream_gene_variant
8 upstream_gene_variant
9 upstream_gene_variant
10 upstream_gene_variant
11 upstream_gene_variant
12 upstream_gene_variant
14 upstream_gene_variant
15 upstream_gene_variant
16 upstream_gene_variant
17 upstream_gene_variant
18 upstream_gene_variant
19 upstream_gene_variant
20 upstream_gene_variant
21 upstream_gene_variant
22 upstream_gene_variant
23 intron_variant

我现在想按 Gene 分组,按 expr 降序排序,然后将数据帧过滤到 底部 10% 的行expr 值为每个 Gene 组(第 10 个百分位数)。所以我执行以下操作:

1) 按表达式降序排序 (SUCCEEDS)

p4p5_sort= p4p5_merge.sort_values(['expr', 'Gene'],
ascending=[False, True]).reset_index(drop=True)

2) 按基因分组并过滤底部 10% 的表达/基因(失败)

p4p5_bottom10  = (p4p5_sort[p4p5_sort.groupby('Gene')['expr'].
apply(lambda x: x < x.quantile(0.1))])

第 1 步按预期工作,但当我运行第 2 步时,我只得到以下响应:

sys:1: DtypeWarning: Columns (15,16,22,36,37,38,39) have mixed types. Specify dtype option on import or set low_memory=False.
Empty DataFrame
Columns: [SampleID, expr, Gene, Period, tag, Consequence]
Index: []

如果有帮助,我想要完成的 R 等效项是:

p4p5_bottom10 <- p4p5_merge %>% select(Gene, expr, SampleID, Period) %>%
group_by(Gene) %>%
arrange(Gene, desc(expr)) %>%
filter(expr < quantile(expr, 0.1))

最佳答案

您可以将分位数直接应用于 grouby,如下所示:
p4p5_bottom10 = pd.DataFrame(p4p5_sort.groupby(['Gene'])['expr'].quantile(0.1))

我们必须应用 pd.DataFrame() 来转换为 DF。

关于python - Pandas数据框如何按值分组,按降序值排序,然后过滤到分位数(0.1),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53181759/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com