gpt4 book ai didi

python - 在 Python 中给定二进制标志设置计数变量( Pandas 数据框)

转载 作者:太空宇宙 更新时间:2023-11-04 05:35:12 26 4
gpt4 key购买 nike

我有一个布局如下的数据框,包括“flag_common”:

cat      flag_1   flag_2  flag_3   pop      state       year    flag_common
value1 1 0 0 1.5 Ohio 2000 1
value3 1 1 0 1.7 Ohio 2001 1
value2 1 1 0 3.6 Ohio 2002 1
value11 0 1 0 2.4 Nevada 2001 2
value5 0 0 0 2.9 Nevada 2002 0
value9 0 0 1 11.1 New York 2003 3
value13 0 0 0 23.4 New York 2004 0
value10 1 1 0 0.1 California 2009 1
value7 0 0 0 0.3 California 2010 0
value14 0 1 1 1.1 California 2009 2

“flag_common”列应该通过查看二进制标志并插入值 1-3 来设置,具体取决于标志是 1 还是 0。当同一行的两个标志设置为 1 时,带有最小的数字被插入到“flag_common”中。这必须是动态的,能够处理 flag_1 到“flag_n”。

我已经使用行迭代方法和 for 循环解决了它,但是我的数据非常大并且变得非常慢,所以我希望有一种“pythonic”方式来编写这个矢量化的方法。

数据框代码如下:

df = pd.DataFrame({'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'New York', 'New York', 'California', 'California', 'California'],
'year' : [2000, 2001, 2002, 2001, 2002, 2003, 2004, 2009, 2010, 2009],
'pop' : [1.5, 1.7, 3.6, 2.4, 2.9, 11.1, 23.4, 0.1, 0.3, 1.1],
'cat' : ['value1', 'value3', 'value2', 'value11', 'value5', 'value9', 'value13', 'value10', 'value7', 'value14'],
'flag_1' : [1, 1,1,0,0,0,0,1,0,0],
'flag_2' : [0, 1,1,1,0,0,0,1,0,1],
'flag_3' : [0, 0, 0, 0,0,1,0,0,0, 1]
})

感谢我提前提出任何想法和建议!

最佳答案

您可以使用 idxmaxflag_1flag_2flag_3 子集中的 columns,然后使用 get_loc 通过列表理解查找位置.

但是所有0值的位置不是0,而是flag_1。所以使用 numpy.where纠正它。

#get min value of columns 'flag_1','flag_2','flag_3'
print df[['flag_1','flag_2','flag_3']].idxmax(axis=1)
0 flag_1
1 flag_1
2 flag_1
3 flag_2
4 flag_1
5 flag_3
6 flag_1
7 flag_1
8 flag_1
9 flag_2
dtype: object

#get position of flag
print df.columns.get_loc('flag_1')
1

#get positions all flags
flag = [df.columns.get_loc(k) for k in df[['flag_1','flag_2','flag_3']].idxmax(axis=1)]
print flag
[1, 1, 1, 2, 1, 3, 1, 1, 1, 2]

#alternative solution for positions of flags - last digit has to be number
print [int(x[-1]) for x in df[['flag_1','flag_2','flag_3']].idxmax(axis=1)]
[1, 1, 1, 2, 1, 3, 1, 1, 1, 2]
#if all values in 'flag_1','flag_2','flag_3' are 0, get 0 else flag
df['new'] = np.where((df[['flag_1','flag_2','flag_3']].sum(axis=1)) == 0, 0, flag)
print df
cat flag_1 flag_2 flag_3 pop state year flag_common new
0 value1 1 0 0 1.5 Ohio 2000 1 1
1 value3 1 1 0 1.7 Ohio 2001 1 1
2 value2 1 1 0 3.6 Ohio 2002 1 1
3 value11 0 1 0 2.4 Nevada 2001 2 2
4 value5 0 0 0 2.9 Nevada 2002 0 0
5 value9 0 0 1 11.1 New York 2003 3 3
6 value13 0 0 0 23.4 New York 2004 0 0
7 value10 1 1 0 0.1 California 2009 1 1
8 value7 0 0 0 0.3 California 2010 0 0
9 value14 0 1 1 1.1 California 2009 2 2

编辑:

您还可以使用文本 flag 动态检查列:

#get columns where first value before _ is text 'flag'
cols = [x for x in df.columns if x.split('_')[0] == 'flag']
print cols
['flag_1', 'flag_2', 'flag_3']

#get min value of columns 'flag_1','flag_2','flag_3'
print df[cols].idxmax(axis=1)
0 flag_1
1 flag_1
2 flag_1
3 flag_2
4 flag_1
5 flag_3
6 flag_1
7 flag_1
8 flag_1
9 flag_2
dtype: object

#get positions of flag
print df.columns.get_loc('flag_1')
1

#get positions all flags
flag = [df.columns.get_loc(k) for k in df[cols].idxmax(axis=1)]
print flag
[1, 1, 1, 2, 1, 3, 1, 1, 1, 2]

#alternative solution for positions of flags - last digit has to be number
print [int(x[-1]) for x in df[cols].idxmax(axis=1)]
[1, 1, 1, 2, 1, 3, 1, 1, 1, 2]
#if all values in 'flag_1','flag_2','flag_3' are 0, get 0 else flag
df['new'] = np.where((df[cols].sum(axis=1)) == 0, 0, flag)
print df
cat flag_1 flag_2 flag_3 pop state year new
0 value1 1 0 0 1.5 Ohio 2000 1
1 value3 1 1 0 1.7 Ohio 2001 1
2 value2 1 1 0 3.6 Ohio 2002 1
3 value11 0 1 0 2.4 Nevada 2001 2
4 value5 0 0 0 2.9 Nevada 2002 0
5 value9 0 0 1 11.1 New York 2003 3
6 value13 0 0 0 23.4 New York 2004 0
7 value10 1 1 0 0.1 California 2009 1
8 value7 0 0 0 0.3 California 2010 0
9 value14 0 1 1 1.1 California 2009 2

关于python - 在 Python 中给定二进制标志设置计数变量( Pandas 数据框),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35861835/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com