gpt4 book ai didi

python - 有条件生成新列-Pandas

转载 作者:行者123 更新时间:2023-12-01 22:35:44 25 4
gpt4 key购买 nike

我正在尝试根据预先存在的列的条件逻辑创建一个新列。我知道可能有更有效的方法来实现这一目标,但我有一些需要包括在内的条件。这只是第一步。

总体范围是创建从 12 映射的两个新列。这些引用了 Object 列,因为每个时间点可以有多行。

Object2Value 确定如何映射新列。因此,如果 Value is == X,我想匹配两个 Object 列以返回相应的 12此时指向一个新列。如果Value is == Y,则应发生相同的过程。如果 Value is == Z,我想插入 0, 0。其他一切都应该是 NaN

df = pd.DataFrame({   
'Time' : ['2019-08-02 09:50:10.1','2019-08-02 09:50:10.1','2019-08-02 09:50:10.2','2019-08-02 09:50:10.3','2019-08-02 09:50:10.3','2019-08-02 09:50:10.4','2019-08-02 09:50:10.5','2019-08-02 09:50:10.6','2019-08-02 09:50:10.6'],
'Object' : ['B','A','A','A','C','C','C','B','B'],
'1' : [1,3,5,7,9,11,13,15,17],
'2' : [0,1,4,6,8,10,12,14,16],
'Object2' : ['A','A',np.nan,'C','C','C','C','B','A'],
'Value' : ['X','X',np.nan,'Y','Y','Y','Y','Z',np.nan],
})

def map_12(df):

for i in df['Value']:
if i == 'X':
df['A1'] = df['1']
df['A2'] = df['2']
elif i == 'Y':
df['A1'] = df['1']
df['A2'] = df['2']
elif i == 'Z':
df['A1'] = 0
df['A2'] = 0
else:
df['A1'] = np.nan
df['A2'] = np.nan

return df

预期输出:

                    Time Object   1   2 Object2 Value    A1    A2
0 2019-08-02 09:50:10.1 A 1 0 A X 1.0 0.0 # Match A-A at this time point, so output is 1,0
1 2019-08-02 09:50:10.1 B 3 1 A X 1.0 0.0 # Still at same time point so use 1,0
2 2019-08-02 09:50:10.2 A 5 4 NaN NaN NaN NaN # No Value so NaN
3 2019-08-02 09:50:10.3 C 7 6 C Y 7.0 6.0 # Match C-C at this time point, so output is 7,6
4 2019-08-02 09:50:10.3 A 9 8 C Y 7.0 6.0 # Still at same time point so use 7,6
5 2019-08-02 09:50:10.4 C 11 10 C Y 11.0 10.0 # Match C-C at this time point, so output is 11,10
6 2019-08-02 09:50:10.5 C 13 12 C Y 13.0 12.0 # Match C-C at this time point, so output is 13,12
7 2019-08-02 09:50:10.6 B 15 14 B Z 0.0 0.0 # Z so 0,0
8 2019-08-02 09:50:10.6 B 17 16 A NaN NaN NaN # No Value so NaN

新样本 df:

 df = pd.DataFrame({   
'Time' : ['2019-08-02 09:50:10.1','2019-08-02 09:50:10.1','2019-08-02 09:50:10.2','2019-08-02 09:50:10.3','2019-08-02 09:50:10.3','2019-08-02 09:50:10.4','2019-08-02 09:50:10.5','2019-08-02 09:50:10.6','2019-08-02 09:50:10.6'],
'Object' : ['B','A','A','A','C','C','C','B','B'],
'1' : [1,3,5,7,9,11,13,15,17],
'2' : [0,1,4,6,8,10,12,14,16],
'Object2' : ['A','A',np.nan,'C','C','C','C','B','A'],
'Value' : ['X','X',np.nan,'Y','Y','Y','Y','Z',np.nan],
})

预期输出:

                    Time Object   1   2 Object2 Value    A1    A2
0 2019-08-02 09:50:10.1 B 1 0 A X 3.0 1.0 # Match A-A at this time point, so output is 3,1
1 2019-08-02 09:50:10.1 A 3 1 A X 3.0 1.0 # Still at same time point so use 3,1
2 2019-08-02 09:50:10.2 A 5 4 NaN NaN NaN NaN # No Value so NaN
3 2019-08-02 09:50:10.3 A 7 6 C Y 9.0 8.0 # Match C-C at this time point, so output is 9,8
4 2019-08-02 09:50:10.3 C 9 8 C Y 9.0 8.0 # Still at same time point so use 9,8
5 2019-08-02 09:50:10.4 C 11 10 C Y 11.0 10.0 # Match C-C at this time point, so output is 11,10
6 2019-08-02 09:50:10.5 C 13 12 C Y 13.0 12.0 # Match C-C at this time point, so output is 13,12
7 2019-08-02 09:50:10.6 B 15 14 B Z 0.0 0.0 # Z so 0,0
8 2019-08-02 09:50:10.6 B 17 16 A NaN NaN NaN # No Value so NaN

最佳答案

使用DataFrame.where + DataFrame.eq创建类似于 df[['1','2']] 的 DataFrame但仅限匹配为 True 的行,其余为 NaN 的行。然后使用 DataFrame.groupby 按时间点分组并用ObjectObject2(matches==True)重合的现有值填充每组缺失的数据。使用DataFrame.where丢弃 df['Value']NaN 的值。最后,当 Z< 时使用 [DataFrame.mask] 设置 0/code> 位于 Value

列中
#matches
matches=df.Object.eq(df.Object2)
#Creating conditions
condition_z=df['Value']=='Z'
not_null=df['Value'].notnull()
#Creating DataFrame to fill
df12=( df[['1','2']].where(matches)
.groupby(df['Time'],sort=False)
.apply(lambda x: x.ffill().bfill()) )
#fill 0 on Value is Z and discarting NaN
df[['A1','A2']] =df12.where(not_null).mask(condition_z,0)
print(df)

输出

                    Time Object   1   2 Object2 Value    A1    A2
0 2019-08-02 09:50:10.1 B 1 0 A X 3.0 1.0
1 2019-08-02 09:50:10.1 A 3 1 A X 3.0 1.0
2 2019-08-02 09:50:10.2 A 5 4 NaN NaN NaN NaN
3 2019-08-02 09:50:10.3 A 7 6 C Y 9.0 8.0
4 2019-08-02 09:50:10.3 C 9 8 C Y 9.0 8.0
5 2019-08-02 09:50:10.4 C 11 10 C Y 11.0 10.0
6 2019-08-02 09:50:10.5 C 13 12 C Y 13.0 12.0
7 2019-08-02 09:50:10.6 B 15 14 B Z 0.0 0.0
8 2019-08-02 09:50:10.6 B 17 16 A NaN NaN NaN
<小时/>

我们还可以使用GroupBy.transform :

#matches
matches=df.Object.eq(df.Object2)
#Creating conditions
condition_z=df['Value']=='Z'
not_null=df['Value'].notnull()
#Creating DataFrame to fill
df12=( df[['1','2']].where(matches)
.groupby(df['Time'],sort=False)
.transform('first') )
#fill 0 on Value is Z and discarting NaN
df[['A1','A2']] =df12.where(not_null).mask(condition_z,0)
print(df)

关于python - 有条件生成新列-Pandas,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58687448/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com