gpt4 book ai didi

python - 需要扩展库存日志(日志) Pandas 数据框以包含每个产品 ID 的所有日期

转载 作者:太空狗 更新时间:2023-10-29 17:19:23 27 4
gpt4 key购买 nike

我有一份库存日志,其中包含产品及其相对库存数量 (resulting_qty) 以及每次添加或减去库存时的损失/ yield (delta_qty)。

问题是库存记录不会每天更新,而是仅在库存发生变化时才更新。出于这个原因,很难提取给定日期所有项目的总库存数量,因为有些项目在特定日期没有记录,尽管事实上它们确实有可用库存,因为它们的最后一个条目 resulting_qty 大于 0。从逻辑上讲,这意味着一件商品的数量在一定天数内没有变化,该天数等于最大日期和最后记录日期之间的天数。

我的数据看起来像这样,除了实际上有成千上万的产品 ID

| date       | timestamp           | pid | delta_qty | resulting_qty |
|------------|---------------------|-----|-----------|---------------|
| 2017-03-06 | 2017-03-06 12:24:22 | A | 0 | 0.0 |
| 2017-03-31 | 2017-03-31 02:43:11 | A | 3 | 3.0 |
| 2017-04-08 | 2017-04-08 22:04:35 | A | -1 | 2.0 |
| 2017-04-12 | 2017-04-12 18:26:39 | A | -1 | 1.0 |
| 2017-04-19 | 2017-04-19 09:15:38 | A | -1 | 0.0 |
| 2019-01-16 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-19 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-05 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-22 | 2019-04-22 11:06:33 | B | -1 | 1.0 |
| 2019-04-23 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-09 | 2019-05-09 16:25:41 | C | 2 | 2.0 |

本质上,我需要让数据看起来更像这样,这样我就可以简单地提取一个日期并在按日期分组时获得给定日期的总库存总和(例如 df.groupby(date).resulting_qty。总和()):

注意由于字符限制,我删除了 PID= A 行,但我希望你明白了:

| date       | timestamp           | pid | delta_qty | resulting_qty |
|------------|---------------------|-----|-----------|---------------|
| 2019-01-16 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-17 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-18 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-19 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-20 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-21 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-22 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-23 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-24 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-25 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-26 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-27 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-28 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-29 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-30 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-01-31 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-01 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-02 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-03 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-04 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-05 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-06 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-07 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-08 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-09 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-10 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-11 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-12 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-13 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-14 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-15 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-16 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-17 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-18 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-19 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-20 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-21 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-22 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-23 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-24 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-25 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-26 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-27 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-02-28 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-01 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-02 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-03 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-04 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-05 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-06 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-07 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-08 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-09 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-10 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-11 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-12 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-13 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-14 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-15 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-16 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-17 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-18 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-19 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-20 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-21 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-22 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-23 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-24 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-25 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-26 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-27 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-28 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-29 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-30 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-03-31 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-04-01 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-04-02 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-04-03 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-04-04 | 2019-01-16 23:37:17 | B | 0 | 0.0 |
| 2019-04-05 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-06 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-07 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-08 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-09 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-10 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-11 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-12 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-13 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-14 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-15 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-16 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-17 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-18 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-19 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-20 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-21 | 2019-04-05 16:40:32 | B | 2 | 2.0 |
| 2019-04-22 | 2019-04-22 11:06:33 | B | -1 | 1.0 |
| 2019-04-23 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-24 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-25 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-26 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-27 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-28 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-29 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-04-30 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-01 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-02 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-03 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-04 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-05 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-06 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-07 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-08 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-09 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-05-10 | 2019-04-23 13:23:17 | B | -1 | 0.0 |
| 2019-01-19 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-20 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-21 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-22 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-23 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-24 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-25 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-26 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-27 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-28 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-29 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-30 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-01-31 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-01 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-02 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-03 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-04 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-05 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-06 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-07 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-08 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-09 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-10 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-11 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-12 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-13 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-14 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-15 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-16 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-17 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-18 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-19 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-20 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-21 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-22 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-23 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-24 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-25 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-26 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-27 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-02-28 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-01 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-02 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-03 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-04 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-05 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-06 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-07 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-08 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-09 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-10 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-11 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-12 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-13 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-14 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-15 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-16 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-17 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-18 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-19 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-20 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-21 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-22 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-23 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-24 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-25 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-26 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-27 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-28 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-29 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-30 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-03-31 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-01 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-02 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-03 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-04 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-05 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-06 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-07 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-08 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-09 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-10 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-11 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-12 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-13 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-14 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-15 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-16 | 2019-01-19 09:40:38 | C | 0 | 0.0 |
| 2019-04-17 | 2019-01-19 09:40:38 | C | 0 | 0.0 |

到目前为止,我所做的是创建一系列循环,这些循环生成一个介于产品生命周期的最短日期和所有产品的最长日期之间的日期范围。然后,如果没有关于新日期的信息,我将最后记录的行值附加为具有新日期的新行。我将这些附加到列表中,然后使用更新后的列表生成一个新的数据框。该代码非常慢,需要 2 个多小时才能完成整个数据集:

date_list = []
pid_list= []
time_stamp_list = []
delta_qty_list = []
resulting_qty_list = []


timer = len(test.product_id.unique().tolist())
counter = 0
for product in test.product_id.unique().tolist():
counter+=1
print((counter/timer)*100)
temp_df = test.query(f'product_id=={product}', engine='python')
for idx,date in enumerate(pd.date_range(temp_df.index.min(),test.index.max()).tolist()):
min_date= temp_df.index.min()
if date.date() == min_date:
date2=min_date
pid = temp_df.loc[date2]['product_id']
timestamp = temp_df.loc[date2]['timestamp']
delta_qty = temp_df.loc[date2]['delta_qty']
resulting_qty = temp_df.loc[date2]['resulting_qty']
date_list.append(date2)
pid_list.append(pid)
delta_qty_list.append(delta_qty)
time_stamp_list.append(timestamp)
resulting_qty_list.append(resulting_qty)
else:

if date.date() in temp_df.index:
date2= date.date()
pid = temp_df.loc[date2]['product_id']
timestamp = temp_df.loc[date2]['timestamp']
delta_qty = temp_df.loc[date2]['delta_qty']
resulting_qty = temp_df.loc[date2]['resulting_qty']
date_list.append(date2)
pid_list.append(pid)
delta_qty_list.append(delta_qty)
time_stamp_list.append(timestamp)
resulting_qty_list.append(resulting_qty)
elif date.date() > date2:
date_list.append(date.date())
pid_list.append(pid)
time_stamp_list.append(timestamp)
delta_qty_list.append(delta_qty)
resulting_qty_list.append(resulting_qty)
else:
pass

有人可以帮助我了解我应该采用的正确方法是什么,因为我 100% 确定这不是最佳方法。

谢谢

最佳答案

这里的想法是重新索引 DataFrame 以填补您的空白。

设置使用您的示例生成的 DataFrame:

from io import StringIO

buffer = StringIO()
buffer.write('''\
date|timestamp|pid|delta_qty|resulting_qty
2017-03-06|2017-03-06 12:24:22|A|0|0.0
2017-03-31|2017-03-31 02:43:11|A|3|3.0
2017-04-08|2017-04-08 22:04:35|A|-1|2.0
2017-04-12|2017-04-12 18:26:39|A|-1|1.0
2017-04-19|2017-04-19 09:15:38|A|-1|0.0
2019-01-16|2019-01-16 23:37:17|B|0|0.0
2019-01-19|2019-01-19 09:40:38|C|0|0.0
2019-04-05|2019-04-05 16:40:32|B|2|2.0
2019-04-22|2019-04-22 11:06:33|B|-1|1.0
2019-04-23|2019-04-23 13:23:17|B|-1|0.0
2019-05-09|2019-05-09 16:25:41|C|2|2.0
''')
buffer.seek(0)

df = pd.read_csv(buffer, sep='|', parse_dates=['date', 'timestamp'])

首先,我们在每个产品的最小日期和最大日期之间生成一个新的、无间隙的索引。根据您的示例,这具有在上次现有更新之后没有产品行的效果。但是,此步骤很容易定制以满足您的具体要求。例如,如果您希望日期从第一次输入产品到今天,您可以手动设置 startend

from itertools import chain, cycle

date_ranges = df.groupby('pid').agg({'date': ['min', 'max']})

pairs = (zip(cycle([pid]), pd.date_range(start, end))
for pid, (start, end) in date_ranges.iterrows())
new_index = pd.Index(chain.from_iterable(pairs), name=['pid', 'date'])

然后我们应用新索引。这里我们有两个选择:

  1. 根据您的示例,我们将完全按照上次更新继续填充
  2. 0 填充 delta_qty 和最后更新的剩余列(这与您的要求有偏差,但看起来合乎逻辑并且只是一个小改动)

无论哪种情况,两个基本概念是.reindex 方法和.fillna 方法。我们可以使用 reindex 来扩展密集的 DataFrame 以包含所有日期但数据稀疏。然后,我们用适当的数据填充 nan。由于我们是上次更新的前向填充,我们希望根据 docs 指定 method='ffill'

方法一:

# this fills the rows per last update
results = df.set_index(['pid', 'date'])\
.reindex(new_index).reset_index()
results.fillna(method='ffill', inplace=True)

返回

    pid       date           timestamp  delta_qty  resulting_qty
0 A 2017-03-06 2017-03-06 12:24:22 0.0 0.0
1 A 2017-03-07 2017-03-06 12:24:22 0.0 0.0
2 A 2017-03-08 2017-03-06 12:24:22 0.0 0.0
3 A 2017-03-09 2017-03-06 12:24:22 0.0 0.0
.. .. ... ... ... ...
24 A 2017-03-30 2017-03-06 12:24:22 0.0 0.0
25 A 2017-03-31 2017-03-31 02:43:11 3.0 3.0
.. .. ... ... ... ...
29 A 2017-04-04 2017-03-31 02:43:11 3.0 3.0

对于 pid == 'A'

方法二:

results = df.set_index(['pid', 'date'])\
.reindex(new_index).reset_index()
results['delta_qty'].fillna(0, inplace=True)
results.fillna(method='ffill', inplace=True)

返回:

    pid       date           timestamp  delta_qty  resulting_qty
0 A 2017-03-06 2017-03-06 12:24:22 0.0 0.0
1 A 2017-03-07 2017-03-06 12:24:22 0.0 0.0
2 A 2017-03-08 2017-03-06 12:24:22 0.0 0.0
3 A 2017-03-09 2017-03-06 12:24:22 0.0 0.0
.. .. ... ... ... ...
24 A 2017-03-30 2017-03-06 12:24:22 0.0 0.0
25 A 2017-03-31 2017-03-31 02:43:11 3.0 3.0
.. .. ... ... ... ...
29 A 2017-04-04 2017-03-31 02:43:11 0.0 3.0

关于python - 需要扩展库存日志(日志) Pandas 数据框以包含每个产品 ID 的所有日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56102929/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com