gpt4 book ai didi

python - 堆栈后删除多个值

转载 作者:太空宇宙 更新时间:2023-11-03 10:48:31 25 4
gpt4 key购买 nike

我有这样的方阵。

            ACSM3     ACSX12    ADXM28  ...   UGT2B15      VCAN        XK
ACSM3 1.000000 0.929347 0.999914 ... 0.986433 0.999947 -0.999680
ACSX12 0.929347 1.000000 0.924428 ... 0.977350 0.925496 -0.919704
ADXM28 0.999914 0.924428 1.000000 ... 0.984196 0.999996 -0.999925
ADAM28 0.999976 0.926774 0.999981 ... 0.985275 0.999994 -0.999831
ADH1B -0.999509 -0.917317 -0.999834 ... -0.980802 -0.999778 0.999982
ADTRP -0.999039 -0.912273 -0.999528 ... -0.978290 -0.999438 0.999828
AEBP1 0.983312 0.846668 0.985611 ... 0.940104 0.985133 -0.987601
AKR1B10 -0.999658 -0.919371 -0.999915 ... -0.981800 -0.999874 1.000000
UBL3 0.997347 0.900002 0.998215 ... 0.971864 0.998043 -0.998870
UGT2B15 0.986433 0.977350 0.984196 ... 1.000000 0.984690 -0.981961
VCAN 0.999947 0.925496 0.999996 ... 0.984690 1.000000 -0.999887
XK -0.999680 -0.919704 -0.999925 ... -0.981961 -0.999887 1.000000

使用堆栈函数后,我将数据调整为我想要的形状,但正如您所见,由于相互比较,所有数据都有多个值。

dfHealty = df_healtyWithGenes.stack().reset_index()
dfHealty.columns = ['gene1', 'gene2', 'score']
dfHealty = dfHealty[dfHealty.gene1 != dfHealty.gene2]

我可以按分数过滤,但这不是个好主意,数据可能会损坏。

如何按基因列过滤?

gene1   gene2   score
EPB41L4B PGC 0.496713249
PGC EPB41L4B 0.496713249
CHGA MT1G 0.496751983
MT1G CHGA 0.496751983
AEBP1 FCER1G 0.497061368
FCER1G AEBP1 0.497061368
ADTRP CAPN9 0.497122603
CAPN9 ADTRP 0.497122603
FAM189A2 GLUL 0.49721763
GLUL FAM189A2 0.49721763
CA9 DUOX1 0.497233294
DUOX1 CA9 0.497233294
EDNRA MSLN 0.497267565
MSLN EDNRA 0.497267565
HRASLS2 LIPF 0.497581499
LIPF HRASLS2 0.497581499
EPB41L4B NEDD4L 0.497613643
NEDD4L EPB41L4B 0.497613643

我需要像这样转换数据。

gene1   gene2   score
EPB41L4B PGC 0.496713249
CHGA MT1G 0.496751983
AEBP1 FCER1G 0.497061368
ADTRP CAPN9 0.497122603
FAM189A2 GLUL 0.49721763
CA9 DUOX1 0.497233294
EDNRA MSLN 0.497267565

最佳答案

使用给定的数据,您可以像这样删除数据中的重复对

import pandas as pd

cols = ['gene1','gene2','score']
data = [['EPB41L4B', 'PGC',0.496713249],
['PGC','EPB41L4B',0.496713249],
['CHGA','MT1G',0.496751983],
['MT1G','CHGA',0.496751983],
['AEBP1','FCER1G',0.497061368 ],
['FCER1G','AEBP1',0.497061368],
['ADTRP','CAPN9',0.497122603],
['CAPN9','ADTRP',0.497122603],
['FAM189A2','GLUL',0.49721763],
['GLUL','FAM189A2',0.49721763],
['CA9','DUOX1',0.497233294],
['DUOX1','CA9',0.497233294],
['EDNRA','MSLN',0.497267565],
['MSLN','EDNRA',0.497267565],
['HRASLS2','LIPF',0.497581499],
['LIPF','HRASLS2',0.497581499],
['EPB41L4B','NEDD4L',0.497613643],
['NEDD4L','EPB41L4B',0.497613643]]

df = pd.DataFrame(data,columns=cols)
df = df[df['gene1'] < df['gene2']]
print(df)

产生这样的输出

       gene1   gene2     score
0 EPB41L4B PGC 0.496713
2 CHGA MT1G 0.496752
4 AEBP1 FCER1G 0.497061
6 ADTRP CAPN9 0.497123
8 FAM189A2 GLUL 0.497218
10 CA9 DUOX1 0.497233
12 EDNRA MSLN 0.497268
14 HRASLS2 LIPF 0.497581
16 EPB41L4B NEDD4L 0.497614

关于python - 堆栈后删除多个值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56090520/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com