gpt4 book ai didi

python - 创建不同类型的嵌套列表的快速方法: numpy, pandas或列表串联?

转载 作者:行者123 更新时间:2023-12-01 06:37:02 27 4
gpt4 key购买 nike

我正在尝试加速下面的代码,该代码生成每列具有不同类型的列表的列表。我最初创建了 pandas 数据框,然后将其转换为列表,但这似乎相当慢。我怎样才能更快地创建这个列表,比如说一个数量级?除一列外,所有列均保持不变。

import pandas as pd
import numpy as np
import time
import datetime

def overflow_check(x):
# in SQL code the column is decimal(13, 2)
p=13
s=3
max_limit = float("9"*(p-s) + "." + "9"*s)
#min_limit = 0.01 #float("0" + "." + "0"*(s-2) + '1')
#min_limit = 0.1
if np.logical_not(isinstance(x, np.ndarray)) or len(x) < 1:
raise Exception("Non-numeric or empty array.")
else:
#print(x)
return x * (np.abs(x) < max_limit) + np.sign(x)* max_limit * (np.abs(x) >= max_limit)

def list_creation(y_forc):


backcast_length = len(y_forc)

backcast = pd.DataFrame(data=np.full(backcast_length, 2),
columns=['TypeId'])


backcast['id2'] = None
backcast['Daily'] = 1
backcast['ForecastDate'] = y_forc.index.strftime('%Y-%m-%d')
backcast['ReportDate'] = pd.to_datetime('today').strftime('%Y-%m-%d')
backcast['ForecastMethodId'] = 1
backcast['ForecastVolume'] = overflow_check(y_forc.values)
backcast['CreatedBy'] = 'test'
backcast['CreatedDt'] = pd.to_datetime('today')


return backcast.values.tolist()

i=pd.date_range('05-01-2010', '21-05-2018', freq='D')
x=pd.DataFrame(index=i, data = np.random.randint(0, 100, len(i)))

t=time.perf_counter()
y =list_creation(x)
print(time.perf_counter()-t)

最佳答案

这应该更快一点,它只是直接创建列表:

def list_creation1(y_forc):
zipped = zip(y_forc.index.strftime('%Y-%m-%d'), overflow_check(y_forc.values)[:,0])
t = pd.to_datetime('today').strftime('%Y-%m-%d')
t1 =pd.to_datetime('today')
return [
[2, None, 1, i, t,
1, v, 'test', t1]
for i,v in zipped
]


%%timeit
list_creation(x)
> 29.3 ms ± 468 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
list_creation1(x)
> 17.1 ms ± 517 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

编辑:速度缓慢的大问题之一是从日期时间转换为指定格式所需的时间。如果我们可以通过如下措辞来摆脱它:

def list_creation1(i, v):
zipped = zip(i, overflow_check(np.array([[_x] for _x in v]))[:,0])
t = pd.to_datetime('today').strftime('%Y-%m-%d')
t1 =pd.to_datetime('today')
return [
[2, None, 1, i, t,
1, v, 'test', t1]
for i,v in zipped
]

start = datetime.datetime.strptime("05-01-2010", "%d-%m-%Y")
end = datetime.datetime.strptime("21-05-2018", "%d-%m-%Y")
i = [(start + datetime.timedelta(days=x)).strftime("%d-%m-%Y") for x in range(0, (end-start).days)]
x=np.random.randint(0, 100, len(i))

那么现在速度快了很多:

%%timeit
list_creation1(i, x)
> 1.87 ms ± 24.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

关于python - 创建不同类型的嵌套列表的快速方法: numpy, pandas或列表串联?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59615191/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com