gpt4 book ai didi

python - matplotlib:绘制时间序列,同时跳过没有数据的时间段

转载 作者:行者123 更新时间:2023-12-01 04:13:36 41 4
gpt4 key购买 nike

tl;dr:在绘制时间序列时如何跳过没有数据的时段?

<小时/>

我正在运行一个长时间的计算,我想监控它的进度。有时我会中断这个计算。日志存储在一个巨大的 CSV 文件中,如下所示:

2016-01-03T01:36:30.958199,0,0,0,startup
2016-01-03T01:36:32.363749,10000,0,0,regular
...
2016-01-03T11:12:21.082301,51020000,13402105,5749367,regular
2016-01-03T11:12:29.065687,51030000,13404142,5749367,regular
2016-01-03T11:12:37.657022,51040000,13408882,5749367,regular
2016-01-03T11:12:54.236950,51050000,13412824,5749375,shutdown
2016-01-03T19:02:38.293681,51050000,13412824,5749375,startup
2016-01-03T19:02:49.296161,51060000,13419181,5749377,regular
2016-01-03T19:03:00.547644,51070000,13423127,5749433,regular
2016-01-03T19:03:05.599515,51080000,13427189,5750183,regular
...

实际上,有 41 列。每一列都是一个特定的进度指标。第二列始终以 10000 为步长递增。最后一列是不言自明的。

我想在同一个图表上绘制每一列,同时跳过“关闭”和“启动”之间的时间段。理想情况下,我还想在每个跳跃上画一条垂直线。

<小时/>

这是我到目前为止所得到的:

import matplotlib.pyplot as plt
import pandas as pd

# < ... reading my CSV in a Pandas dataframe `df` ... >

fig, ax = plt.subplots()

for col in ['total'] + ['%02d' % i for i in range(40)]:
ax.plot_date(df.index.values, df[col].values, '-')

fig.autofmt_xdate()
plt.show()

so far

我想摆脱那段漫长的平坦期,只画一条垂直线。

我知道df.plot(),但根据我的经验,它被破坏了(除其他外,Pandas 以自己的格式转换datetime对象,而不是使用date2numnum2date)。

<小时/>

看起来一个可能的解决方案是编写 custom scaler ,但这看起来相当复杂。

据我了解,编写自定义定位器只会更改刻度线的位置(小垂直线和相关标签),但不会更改绘图本身的位置。这是正确的吗?

UPD:一个简单的解决方案是更改时间戳(例如,将它们重新计算为“自开始以来耗时”),但我更愿意保留它们。

UPD:答案在https://stackoverflow.com/a/5657491/1214547经过一些修改对我有用。我很快就会写出我的解决方案。

最佳答案

这是一个适合我的解决方案。它不能很好地处理紧密定位的中断(标签可能会变得太拥挤),但就我而言,这并不重要。

import bisect
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.scale as mscale
import matplotlib.transforms as mtransforms
import matplotlib.dates as mdates
import pandas as pd

# heavily borrows from http://stackoverflow.com/a/5657491/1214547

def CustomScaleFactory(breaks):
class CustomScale(mscale.ScaleBase):
name = 'custom'

def __init__(self, axis, **kwargs):
mscale.ScaleBase.__init__(self)

def get_transform(self):
return self.CustomTransform()

def set_default_locators_and_formatters(self, axis):
class HourSkippingLocator(mdates.HourLocator):
_breaks = breaks
def __init__(self, *args, **kwargs):
super(HourSkippingLocator, self).__init__(*args, **kwargs)

def _tick_allowed(self, tick):
for left, right in self._breaks:
if left <= tick <= right:
return False
return True

def __call__(self):
ticks = super(HourSkippingLocator, self).__call__()
ticks = [tick for tick in ticks if self._tick_allowed(tick)]
ticks.extend(right for (left, right) in self._breaks)
return ticks

axis.set_major_locator(HourSkippingLocator(interval=3))
axis.set_major_formatter(mdates.DateFormatter("%h %d, %H:%M"))

class CustomTransform(mtransforms.Transform):
input_dims = 1
output_dims = 1
is_separable = True
has_inverse = True
_breaks = breaks

def __init__(self):
mtransforms.Transform.__init__(self)

def transform_non_affine(self, a):
# I have tried to write something smart using np.cumsum(),
# but failed, since it was too complicated to handle the
# transformation for points within breaks.
# On the other hand, these loops are very easily translated
# in plain C.

result = np.empty_like(a)

a_idx = 0
csum = 0
for left, right in self._breaks:
while a_idx < len(a) and a[a_idx] < left:
result[a_idx] = a[a_idx] - csum
a_idx += 1
while a_idx < len(a) and a[a_idx] <= right:
result[a_idx] = left - csum
a_idx += 1
csum += right - left

while a_idx < len(a):
result[a_idx] = a[a_idx] - csum
a_idx += 1

return result

def inverted(self):
return CustomScale.InvertedCustomTransform()

class InvertedCustomTransform(mtransforms.Transform):
input_dims = 1
output_dims = 1
is_separable = True
has_inverse = True
_breaks = breaks

def __init__(self):
mtransforms.Transform.__init__(self)

def transform_non_affine(self, a):
# Actually, this transformation isn't exactly invertible.
# It may glue together some points, and there is no way
# to separate them back. This implementation maps both
# points to the *left* side of the break.

diff = np.zeros(len(a))

total_shift = 0

for left, right in self._breaks:
pos = bisect.bisect_right(a, left - total_shift)
if pos >= len(diff):
break
diff[pos] = right - left
total_shift += right - left

return a + diff.cumsum()

def inverted(self):
return CustomScale.CustomTransform()

return CustomScale


# < ... reading my CSV in a Pandas dataframe `df` ... >

startups = np.where(df['kind'] == 'startup')[0]
shutdowns = np.where(df['kind'] == 'shutdown')[0]

breaks_idx = list(zip(shutdowns, startups[1:]))
breaks_dates = [(df.index[l], df.index[r]) for (l, r) in breaks_idx]
breaks = [(mdates.date2num(l), mdates.date2num(r)) for (l, r) in breaks_dates]

fig, ax = plt.subplots()

for col in ['total'] + ['%02d' % i for i in range(40)]:
ax.plot_date(df.index.values, df[col].values, '-')

# shame on matplotlib: there is no way to unregister a scale
mscale.register_scale(CustomScaleFactory(breaks))
ax.set_xscale('custom')

vlines_x = [r for (l, r) in breaks]
vlines_ymin = np.zeros(len(vlines_x))
vlines_ymax = [df.iloc[r]['total'] for (l, r) in breaks_idx]
plt.vlines(vlines_x, vlines_ymin, vlines_ymax, color='darkgrey')

fig.autofmt_xdate()
plt.ticklabel_format(axis='y', style='plain')

plt.show()

result

关于python - matplotlib:绘制时间序列,同时跳过没有数据的时间段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34580542/

41 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com