作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
# df
date value
0 2018-01-22 01:01:53.192824 1
1 2018-01-22 01:01:55.042070 2
2 2018-01-22 01:01:56.264234 3
3 2018-01-22 01:01:57.697656 2
4 2018-01-22 01:01:57.831543 2
5 2018-01-22 01:02:00.258684 1
6 2018-01-22 01:02:00.259691 3
7 2018-01-22 01:02:00.260698 2
8 2018-01-22 01:02:00.261683 1
9 2018-01-22 01:02:00.333109 2
我的目标是制作一个字典,其中包含与每一分钟对应的键以及每次最后 3 个值的计算结果。
如果某一时刻最后3个值的序列不是连续上升或下降,则计算为累加1。
简单地说,如果某个时刻的最后 3 个值类似于 ↘↗ 或 ↗↘,则将其键值加 1。
比如2018-01-22 01:01:56.264234,最后3个值分别是1、2、3,而且是递增的,你不加1。
但是在 2018-01-22 01:01:57.697656,最后 3 个值是 2、3、2,它们就像 ↗↘ 然后你加 1。
上面的数据框会生成一个像这样的字典:
dic_result = { np.datetime('2018-01-22 01:01'): 1, # [2, 3, 2]
np.datetime('2018-01-22 01:02'): 3 } # [2, 1, 3], [1, 3, 2], [2, 1, 2]
这是我为完成这项工作而编写的程序,它运行良好,但如果数据帧很大,它会花费太多时间。我希望知道如何改进此代码并获得更好的性能,例如,使用 numpy 数组或更好的算法。
# I used deque to store last 3 values
deq_3_trs = deque(maxlen=3)
dic_result = {}
for i in range( len(df) ):
date = df.ix[i]['date']
date_min = np.datetime64(date, 'm')
value = df.ix[i]['value']
deq_3_trs.append(value)
if (date_min not in dic_result) and (len(deq_3_trs) == 3):
dic_result[date_min] = 0
# check the deque if the values are like either ↘↗ or ↗↘
if (deq_3_trs[0] > deq_3_trs[1] < deq_3_trs[2]) or (deq_3_trs[0] < deq_3_trs[1] > deq_3_trs[2]):
dic_result[date_min] += 1
elif (date_min in dic_result) and (len(deq_3_trs) == 3):
# check the deque if the values are like either ↘↗ or ↗↘
if (deq_3_trs[0] > deq_3_trs[1] < deq_3_trs[2]) or (deq_3_trs[0] < deq_3_trs[1] > deq_3_trs[2]):
dic_result[date_min] += 1
流程图 i==0, i==2 and i==3
0 2018-01-22 01:01:53.192824 1
1 2018-01-22 01:01:55.042070 2
2 2018-01-22 01:01:56.264234 3
3 2018-01-22 01:01:57.697656 2
4 2018-01-22 01:01:57.831543 2
5 2018-01-22 01:02:00.258684 1
6 2018-01-22 01:02:00.259691 3
7 2018-01-22 01:02:00.260698 2
8 2018-01-22 01:02:00.261683 1
9 2018-01-22 01:02:00.333109 2
If i == 0 in the FOR loop,
date == 2018-01-22 01:01:53.192824
date_min == numpy.datetime64('2018-01-22T01:01')
value == 1
deq_3_trs == deque([1], maxlen=3)
Since len(deq_3_trs) != 3, the FOR loop ends now.
If i == 2 in the FOR loop,
date == 2018-01-22 01:01:56.264234
date_min == numpy.datetime64('2018-01-22T01:01')
value == 3
deq_3_trs == deque([1, 2, 3], maxlen=3)
Since len(deq_3_trs) == 3 and the dictionary dic_result has no key as 'numpy.datetime64('2018-01-22T01:01')',
it creats the key and defaults it to 0. dic_result == { 'numpy.datetime64('2018-01-22T01:01')':0 }
The series of values in the deque is not like ↘↗ or ↗↘, the FOR loop ends now.
If i == 3 in the FOR loop,
date == 2018-01-22 01:01:57.697656
date_min == numpy.datetime64('2018-01-22T01:01')
value == 2
deq_3_trs == deque([2, 3, 2], maxlen=3)
Since len(deq_3_trs) == 3 and the dictionary dic_result has the 'numpy.datetime64('2018-01-22T01:01')' and
the series of values in the deque is like ↗↘, it adds 1 to the key. dic_result == { 'numpy.datetime64('2018-01-22T01:01')':1 }
最佳答案
这可能行得通。迭代行似乎从来都不理想,但您可能会受益于 collections .
import pandas as pd
from collections import defaultdict
df = pd.DataFrame([['2018-01-22 01:01:53.192824', 1], ['2018-01-22 01:01:55.042070', 2],
['2018-01-22 01:01:56.264234', 3], ['2018-01-22 01:01:57.697656', 2],
['2018-01-22 01:01:57.831543', 2], ['2018-01-22 01:02:00.258684', 1],
['2018-01-22 01:02:00.259691', 3], ['2018-01-22 01:02:00.260698', 2],
['2018-01-22 01:02:00.261683', 1], ['2018-01-22 01:02:00.333109', 2]],
columns=['date', 'value'])
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df.index = df.index.map(lambda x: x.replace(second=0).replace(microsecond=0))
result = defaultdict(list)
def not_noninc_or_nondec(L):
return not (all(x>=y for x, y in zip(L, L[1:])) or all(x<=y for x, y in zip(L, L[1:])))
for i, (idx, row) in enumerate(df.iterrows()):
if i >= 2:
result[idx].append(not_noninc_or_nondec(df['value'][i-2:i+1].tolist()))
result_count = {k: sum(v) for k, v in result.items()}
# {Timestamp('2018-01-22 01:01:00'): 1, Timestamp('2018-01-22 01:02:00'): 3}
关于python - 我如何替换这些 FOR 和 IF 语句以获得更好的性能?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48392079/
我是一名优秀的程序员,十分优秀!