gpt4 book ai didi

python - Pandas 依靠柱子

转载 作者:太空宇宙 更新时间:2023-11-03 15:21:09 25 4
gpt4 key购买 nike

我正在将 SPSS 代码转换为 Pandas,并且我正在尝试找到一种 Pythonic 方式来表达这个东西:

COUNT WBbf = M1 M26 M38 M50 M62 M74 M85 M97 M109 
M121 M133 M144 (1).

COUNT SPbf = M2 M15 M39 M51 M75 M87 M110 (1)
M63 M98 M122 M134 M145 (0).

COUNT ACbf = M3 M16 M27 M52 M76 M88 M111 M123 M135 M146 (1)
M64 M99 (0).

COUNT SCbf = M5 M17 M40 M77 M112 (1)
M28 M65 M89 M100 M124 M136 M148 (0).

我的数据框具有以下形式:

In [90]: data[b]
Out[90]:
M1 M2 M3 M4 M5 M6 M7 M8 M9 \
case_id
ERAB_S1_LR_Q1_261016 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_AS_011116 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0
ERAB_S2_LR_Q1_021116AFTERNOO 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0
ERAB_S2_AS031116MORNING 1.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0
ERAB_S3_AS031116AFTERNOON 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_S1_AS041116 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_LOH__S3_021116 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_LR_081116 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_S1_AS_111116 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0
ERAB_S1_141116AFTERNOON 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_S1_LOH_151116 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0
ERAB_S1_161116 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0

等等...我想计算值并创建一个新列,其中包含每个案例 ID 的结果。

最佳答案

我相信你可以先通过 loc 选择数据,比较eq然后sum每行的 True 值:

#add strings by your data
SPbf1 = 'M2 M5 M8'.split()
SPbf0 = 'M6 M9'.split()
print (SPbf1)
['M2', 'M5', 'M8']

print (SPbf0)
['M6', 'M9']

df['SPbf'] = df[SPbf1].eq(1).sum(axis=1) + df[SPbf0].eq(0).sum(axis=1)
print (df)
M1 M2 M3 M4 M5 M6 M7 M8 M9 \
case_id
ERAB_S1_LR_Q1_261016 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_AS_011116 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0
ERAB_S2_LR_Q1_021116AFTERNOO 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0
ERAB_S2_AS031116MORNING 1.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0
ERAB_S3_AS031116AFTERNOON 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_S1_AS041116 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_LOH__S3_021116 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_LR_081116 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_S1_AS_111116 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0
ERAB_S1_141116AFTERNOON 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_S1_LOH_151116 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0
ERAB_S1_161116 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0

SPbf
case_id
ERAB_S1_LR_Q1_261016 2
ERAB_AS_011116 4
ERAB_S2_LR_Q1_021116AFTERNOO 1
ERAB_S2_AS031116MORNING 1
ERAB_S3_AS031116AFTERNOON 1
ERAB_S1_AS041116 1
ERAB_LOH__S3_021116 2
ERAB_LR_081116 2
ERAB_S1_AS_111116 2
ERAB_S1_141116AFTERNOON 2
ERAB_S1_LOH_151116 2
ERAB_S1_161116 2

如果某些列名称可能会丢失loc,请使用 reindex_axis :

SPbf1 = 'M2 M15 M39 M51 M75 M87 M110'.split()
SPbf0 = 'M63 M98 M122 M134 M145'.split()
print (SPbf1)
['M2', 'M15', 'M39', 'M51', 'M75', 'M87', 'M110']

print (SPbf0)
['M63', 'M98', 'M122', 'M134', 'M145']

df['SPbf'] = df.reindex_axis(SPbf1, axis=1).eq(1).sum(axis=1) + \
df.reindex_axis(SPbf0, axis=1).eq(0).sum(axis=1)
<小时/>
print (df)
M1 M2 M3 M4 M5 M6 M7 M8 M9 \
case_id
ERAB_S1_LR_Q1_261016 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_AS_011116 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0
ERAB_S2_LR_Q1_021116AFTERNOO 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0
ERAB_S2_AS031116MORNING 1.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0
ERAB_S3_AS031116AFTERNOON 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_S1_AS041116 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_LOH__S3_021116 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_LR_081116 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_S1_AS_111116 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0
ERAB_S1_141116AFTERNOON 1.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0
ERAB_S1_LOH_151116 1.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 1.0
ERAB_S1_161116 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0

SPbf
case_id
ERAB_S1_LR_Q1_261016 1
ERAB_AS_011116 1
ERAB_S2_LR_Q1_021116AFTERNOO 1
ERAB_S2_AS031116MORNING 1
ERAB_S3_AS031116AFTERNOON 0
ERAB_S1_AS041116 0
ERAB_LOH__S3_021116 1
ERAB_LR_081116 1
ERAB_S1_AS_111116 1
ERAB_S1_141116AFTERNOON 1
ERAB_S1_LOH_151116 0
ERAB_S1_161116 1

关于python - Pandas 依靠柱子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43541381/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com