gpt4 book ai didi

python - 比较两个数据帧并在匹配时通过填充二进制将每个值转置为列?

转载 作者:行者123 更新时间:2023-12-04 15:23:27 25 4
gpt4 key购买 nike

我有两个数据框如下:

df1

SYMBOL  seqnames    start      end    SampleID
SPATA21 1 16736303 16736303 eAPD114
E2F2 1 23836607 23836607 eAPD114
FCN3 1 27701288 27701288 eAPD120
MARCKSL 1 32800671 32800671 KAPD144
MARCKSL 1 32800671 32800671 eAPD184
LRRC40 1 70644607 70644607 eAPD184
KREMEN1 22 29536275 29536275 eAPD005
KIF14 1 200569584 200569584 eAPD081
RGS7BP 5 63802465 63802465 YAPD025
PCDHB6 5 140531231 140531231 YAPD025
SERPINB4 18 61305310 61305310 eAPD081

df2

SYMBOL  seqnames    start      end
SPATA21 1 16736303 16736303
E2F2 1 23836607 23836607
FCN3 1 27701288 27701288
MARCKSL 1 32800671 32800671
LRRC40 1 70644607 70644607
KREMEN1 22 29536275 29536275
SERPINB4 18 61305310 61305310
SERPINB4 21 61305310 61305310

我想将 df1 映射到 df2,并将每个 SampleID 表示为单独的列,如果 df1 和 df2 之间存在匹配,则填充 0 和 1:

预期输出:

SYMBOL  seqnames    start      end      eAPD114 eAPD120 KAPD144 eAPD184 eAPD005 eAPD081 YAPD025
SPATA21 1 16736303 16736303 1 0 0 0 0 0 0
E2F2 1 23836607 23836607 1 0 0 0 0 0 0
FCN3 1 27701288 27701288 0 1 0 0 0 0 0
MARCKSL 1 32800671 32800671 0 0 1 1 0 0 0
LRRC40 1 70644607 70644607 0 0 0 1 0 0 0
KREMEN1 22 29536275 29536275 0 0 0 0 1 0 0
SERPINB4 18 61305310 61305310 0 0 0 0 0 1 0
SERPINB4 21 61305310 61305310 0 0 0 0 0 0 0

我尝试使用提到的枢轴方法 here .但是没有高效工作

最佳答案

使用:

cols = ['SYMBOL','seqnames','start','end']
#left join between both DataFrames
df = df2.merge(df1, on=cols, how='left')
#convert column SampleID to indicators,get max and last add missing df1['SampleID']
df = (df.join(pd.get_dummies(df.pop('SampleID')))
.groupby(cols).max()
.reindex(df1['SampleID'].unique(), axis=1, fill_value=0)
.reset_index())
print (df)
SYMBOL seqnames start end eAPD114 eAPD120 KAPD144 eAPD184 \
0 E2F2 1 23836607 23836607 1 0 0 0
1 FCN3 1 27701288 27701288 0 1 0 0
2 KREMEN1 22 29536275 29536275 0 0 0 0
3 LRRC40 1 70644607 70644607 0 0 0 1
4 MARCKSL 1 32800671 32800671 0 0 1 1
5 SERPINB4 18 61305310 61305310 0 0 0 0
6 SERPINB4 21 61305310 61305310 0 0 0 0
7 SPATA21 1 16736303 16736303 1 0 0 0

eAPD005 eAPD081 YAPD025
0 0 0 0
1 0 0 0
2 1 0 0
3 0 0 0
4 0 0 0
5 0 1 0
6 0 0 0
7 0 0 0

关于python - 比较两个数据帧并在匹配时通过填充二进制将每个值转置为列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62827872/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com