gpt4 book ai didi

python - Pandas For Loop 错误 - 嵌入了和/if 语句

转载 作者:太空宇宙 更新时间:2023-11-04 07:59:39 24 4
gpt4 key购买 nike

我有一个时间序列pandas.DataFrame,'ES_Summary_Index1',如下:

     Ticker_x                Date  Close_x 15M_Long 1H_Long Net_Long
0 ES H7 2016-10-18 13:44:59 2128.00 N NaN
1 ES H7 2016-10-18 13:59:59 2128.75 N NaN
2 ES H7 2016-10-18 14:14:59 2125.75 N NaN
3 ES H7 2016-10-18 14:29:59 2126.50 N N
4 ES H7 2016-10-18 14:44:59 2126.50 N NaN
5 ES H7 2016-10-18 16:14:59 2126.00 N NaN
6 ES H7 2016-10-18 16:44:59 2126.25 N NaN
7 ES H7 2016-10-18 17:59:59 2126.50 N NaN
8 ES H7 2016-10-18 18:14:59 2127.00 N NaN
9 ES H7 2016-10-18 19:14:59 2127.75 N NaN
10 ES H7 2016-10-18 19:44:59 2127.75 N NaN
11 ES H7 2016-10-18 19:59:59 2127.75 N NaN
12 ES H7 2016-10-18 20:44:59 2129.00 N NaN
13 ES H7 2016-10-18 21:29:59 2128.75 N N
14 ES H7 2016-10-18 21:44:59 2129.00 N NaN

关注 15M_Long1H_Long 列,如果两者都说“Y”,我希望 Net_Long 列说 Long 还有。如果只有一个或都不说“Y”,那么我希望 Net_Long 列保持空白或说“N”(无论哪个)。

首先,我将 Net_Long 列设置为空白:

ES_Summary_Index1['Net_Long'] = ''

接下来,我正在编写一个 for 循环语句来填充 Net_Long 列:

for index, row in ES_Summary_Index1.iterrows():
if ES_Summary_Index1.loc[index, '15M_Long'] is 'Y' & ES_Summary_Index1.loc[index, '1H_Long'] is 'Y':
ES_Summary_Index1.loc['Net_Long'] = 'Long'
else:
ES_Summary_Index1.loc['Net_Long'] = 'N'

不幸的是,我收到以下错误:

TypeError: unsupported operand type(s) for &: 'str' and 'float'

...引用上面的if语句(if ES_Summary_Index1 ...)。我已经尝试将 & 更改为 and 但这并没有像我希望的那样填充 Net_Long 列。我也尝试过 == 而不是 is ,但它不起作用。有人可以帮忙吗?

最佳答案

你需要非常快速的向量化 numpy.where带有 bool 掩码:

mask = (df['15M_Long'] == 'Y') & (df['1H_Long'] == 'Y')
df['Net_Long'] = np.where(mask, 'Long', 'N')

print (df)
Ticker_x Date Close_x 15M_Long 1H_Long Net_Long
0 ES_H7 2016-10-18T13:44:59 2128.00 N NaN N
1 ES_H7 2016-10-18T13:59:59 2128.75 N NaN N
2 ES_H7 2016-10-18T19:59:59 2127.75 Y NaN N
3 ES_H7 2016-10-18T20:44:59 2129.00 N Y N
4 ES_H7 2016-10-18T21:29:59 2128.75 Y Y Long
5 ES_H7 2016-10-18T21:44:59 2129.00 N NaN N

时间:

#length of df is 600 rows
In [183]: %timeit (iterate(df))
10 loops, best of 3: 67.1 ms per loop

In [184]: %timeit (vectorize(df1))
1000 loops, best of 3: 1.49 ms per loop

#length of df is 6000 rows
In [177]: %timeit (iterate(df))
1 loop, best of 3: 681 ms per loop

In [178]: %timeit (vectorize(df1))
100 loops, best of 3: 3.23 ms per loop

#length of df is 60000 rows
In [180]: %timeit (iterate(df))
1 loop, best of 3: 6.87 s per loop

In [181]: %timeit (vectorize(df1))
10 loops, best of 3: 20.8 ms per loop

计时代码:

data = [x.strip().split() for x in """
Ticker_x Date Close_x 15M_Long 1H_Long
ES_H7 2016-10-18T13:44:59 2128.00 N NaN
ES_H7 2016-10-18T13:59:59 2128.75 N NaN
ES_H7 2016-10-18T19:59:59 2127.75 Y NaN
ES_H7 2016-10-18T20:44:59 2129.00 N Y
ES_H7 2016-10-18T21:29:59 2128.75 Y Y
ES_H7 2016-10-18T21:44:59 2129.00 N NaN
""".split('\n')[1:-1]]
df = pd.DataFrame(data=data[1:], columns=data[0])
#for 600 rows * 100, 6000 rows *1000, 60k * 10000
df = pd.concat([df]*1000).reset_index(drop=True)
print (df)
df1 = df.copy()

def vectorize(df):
mask = (df['15M_Long'] == 'Y') & (df['1H_Long'] == 'Y')
df['Net_Long'] = np.where(mask, 'Long', 'N')
return (df)

def iterate(df):
df['Net_Long'] = ''

for _, row in df.iterrows():
if row['15M_Long'] is 'Y' and row['1H_Long'] is 'Y':
row['Net_Long'] = 'Long'
else:
row['Net_Long'] = 'N'
return df

print (iterate(df))
print (vectorize(df1))

关于python - Pandas For Loop 错误 - 嵌入了和/if 语句,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42472579/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com