gpt4 book ai didi

python - 无法根据条件创建列(Python)

转载 作者:太空狗 更新时间:2023-10-30 02:53:44 24 4
gpt4 key购买 nike

所以我有一个 40,000 行的数据集“dd”,如下所示:

dd.head(21)
Out[64]:
MT MTBR Prd QPA RT Type WH
0 3 539 24Months 1 'NA' NR 188
1 3 51 24Months 4 'NA' NR 188
2 3 112 24Months 10 6 RP 188
3 3 385 24Months 2 7 RP 188
4 3 206 24Months 1 8 RP 188
5 3 349 24Months 19 'NA' NR 188
6 3 569 24Months 18 'NA' NR 188
7 3 66 24Months 20 8 RP 188
8 3 181 24Months 9 'NA' NR 188
9 3 149 24Months 2 'NA' NR 188
10 3 131 24Months 8 7 RP 188
11 3 289 24Months 11 3 RP 188
12 3 392 24Months 13 2 RP 188
13 3 303 24Months 9 'NA' NR 188
14 3 318 24Months 5 5 RP 188
15 3 103 24Months 9 6 RP 188
16 3 447 24Months 8 6 RP 188
17 3 600 24Months 19 'NA' NR 188
18 3 258 24Months 12 'NA' NR 188
19 3 164 24Months 13 'NA' NR 188
20 3 589 24Months 11 'NA' NR 188

我想在此数据集中创建另一个具有以下条件的列 mean_v:

for q,m,w,rt,mt in zip(dd.QPA,dd.MT,dd.WH,dd.RT,dd.MTBR):
if dd.Type=='NR':
dd.mean_v = q*m*w*24 / (mt*1000)

elif dd.Type=='RP':
dd.mean_v = q*m*w*rt / (mt*1000)

但我收到以下错误:

ValueError: The truth value of a Series is ambiguous. 
Use a.empty, a.bool(), a.item(), a.any() or a.all().

如果有人能帮助我纠正代码中的错误,我将不胜感激。非常感谢。

最佳答案

在 pandas 中最好避免循环,因为速度慢,所以最好使用 numpy.select :

#first replace all numeric to NaN and then to 0
dd.RT =
m1 = dd.Type=='NR'
m2 = dd.Type=='RP'

s = dd.QPA *dd.MT * dd.WH
s1 = dd.MTBR * 1000

s2 = s * 24 / s1
s3 = s * dd.RT / s1

dd['mean_v'] = np.select([m1, m2], [s2, s3], default=np.nan)

但如果只有 Type 列中的 NRRP 值使用 numpy.where :

dd['mean_v'] = np.where(m1, s2, s3) 

循环版本(很慢):

dd.RT = pd.to_numeric(dd.RT, errors='coerce').fillna(0)    
for i, x in dd.iterrows():
if x['Type'] =='NR':
dd.loc[i, 'mean_v'] = x.QPA*x.MT*x.WH*24 / (x.MTBR*1000)
elif x.Type=='RP':
dd.loc[i, 'mean_v'] = x.QPA*x.MT*x.WH*x.RT / (x.MTBR*1000)
else:
dd.loc[i, 'mean_v'] = np.nan

如果对于TYPE==NRRT总是24:

s = pd.to_numeric(dd.RT, errors='coerce').fillna(24)
dd['mean_v'] = (dd.QPA * dd.MT * dd.WH * s) / (dd.MTBR * 1000)

print (dd)

MT MTBR Prd QPA RT Type WH mean_v
0 3 539 24Months 1 0.0 NR 188 0.025113
1 3 51 24Months 4 0.0 NR 188 1.061647
2 3 112 24Months 10 6.0 RP 188 0.302143
3 3 385 24Months 2 7.0 RP 188 0.020509
4 3 206 24Months 1 8.0 RP 188 0.021903
5 3 349 24Months 19 0.0 NR 188 0.736917
6 3 569 24Months 18 0.0 NR 188 0.428204
7 3 66 24Months 20 8.0 RP 188 1.367273
8 3 181 24Months 9 0.0 NR 188 0.673061
9 3 149 24Months 2 0.0 NR 188 0.181691
10 3 131 24Months 8 7.0 RP 188 0.241099
11 3 289 24Months 11 3.0 RP 188 0.064401
12 3 392 24Months 13 2.0 RP 188 0.037408
13 3 303 24Months 9 0.0 NR 188 0.402059
14 3 318 24Months 5 5.0 RP 188 0.044340
15 3 103 24Months 9 6.0 RP 188 0.295689
16 3 447 24Months 8 6.0 RP 188 0.060564
17 3 600 24Months 19 0.0 NR 188 0.428640
18 3 258 24Months 12 0.0 NR 188 0.629581
19 3 164 24Months 13 0.0 NR 188 1.072976
20 3 589 24Months 11 0.0 NR 188 0.252795

时间:

In [1]: %timeit jez1(dd)
14.1 ms ± 82 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [2]: %timeit jez2(dd)
8.97 ms ± 32 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [3]: %timeit jez3(dd)
25.1 s ± 769 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit (jez4(dd))
2.63 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]: %timeit (rsno(dd))
24.6 ms ± 267 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [6]: %timeit (rsno1(dd))
1.62 s ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

dd = pd.concat([dd] * 2000, ignore_index=True)

#print (dd)

def jez1(dd):
dd.RT = pd.to_numeric(dd.RT, errors='coerce').fillna(0)
m1 = dd.Type=='NR'
m2 = dd.Type=='RP'
s = dd.QPA *dd.MT * dd.WH
s1 = dd.MTBR * 1000

s2 = s * 24 / s1
s3 = s * dd.RT / s1

dd['mean_v'] = np.select([m1, m2], [s2, s3], default=np.nan)
return dd

def jez2(dd):
dd.RT = pd.to_numeric(dd.RT, errors='coerce').fillna(0)
m1 = dd.Type=='NR'
s = dd.QPA *dd.MT * dd.WH
s1 = dd.MTBR * 1000

s2 = s * 24 / s1
s3 = s * dd.RT / s1

dd['mean_v'] = np.where(m1, s2, s3)
return dd

def jez3(dd):
dd.RT = pd.to_numeric(dd.RT, errors='coerce').fillna(0)
for i, x in dd.iterrows():
if x['Type'] =='NR':
dd.loc[i, 'mean_v'] = x.QPA*x.MT*x.WH*24 / (x.MTBR*1000)
elif x.Type=='RP':
dd.loc[i, 'mean_v'] = x.QPA*x.MT*x.WH*x.RT / (x.MTBR*1000)
else:
dd.loc[i, 'mean_v'] = np.nan
return dd


def jez4(dd):
dd.RT = pd.to_numeric(dd.RT, errors='coerce').fillna(24)
dd['mean_v'] = (dd.QPA * dd.MT * dd.WH * dd.RT) / (dd.MTBR * 1000)
return dd

def rsno(dd):
dd['RTT'] = list(map(lambda x: int(x) if x != "'NA'" else 24, dd.RT.tolist()))
dd['mean_v'] = (dd.QPA * dd.MT * dd.WH * dd.RTT) / (dd.MTBR * 1000)
return dd

def rsno1(dd):
dd['RTT'] = dd.apply(lambda row: int(row.RT) if row.RT != "'NA'" else 24 , axis=1)
dd['mean_v'] = (dd.QPA * dd.MT * dd.WH * dd.RTT) / (dd.MTBR * 1000)
return dd

关于python - 无法根据条件创建列(Python),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48473706/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com