用日期范围填充行的 Pythonic 方法-6ren

用日期范围填充行的 Pythonic 方法

转载作者：太空狗更新时间：2023-10-29 21:27:45

24

4

我正在处理一个问题陈述，它要求我填写缺失日期的行(即 pandas 数据框列中两个日期之间的日期)。请看下面的例子。我将 Pandas 用于我当前的方法(如下所述)。

输入数据示例(大约有 25000 行):

A  | B  | C  | Date1    | Date2
a1 | b1 | c1 | 1Jan1990 | 15Aug1990 <- this row should be repeated for all dates between the two dates
.......................
a3 | b3 | c3 | 11May1986 | 11May1986 <- this row should NOT be repeated. Just 1 entry since both dates are same.
.......................
a5 | b5 | c5 | 1Dec1984 | 31Dec2017 <- this row should be repeated for all dates between the two dates
..........................
..........................

预期输出:

A  | B  | C  | Month    | Year
a1 | b1 | c1 | 1        | 1990  <- Since date 1 column for this row was Jan 1990
a1 | b1 | c1 | 2        | 1990    
.......................
.......................
a1 | b1 | c1 | 7        | 1990  
a1 | b1 | c1 | 8        | 1990  <- Since date 2 column for this row was Aug 1990
..........................
a3 | b3 | c3 | 5        | 1986  <- only 1 row since two dates in input dataframe were same for this row.
...........................
a5 | b5 | c5 | 12       | 1984 <- since date 1 column for this row was Dec 1984
a5 | b5 | c5 | 1        | 1985 
..........................
..........................
a5 | b5 | c5 | 11       | 2017 
a5 | b5 | c5 | 12       | 2017 <- Since date 2 column for this row was Dec 2017

我知道实现此目的的更传统方法(我目前的方法):

遍历每一行。
获取两个日期列之间的天数差异。
如果两列中的日期相同，则只在输出数据框中包含该月和年的一行
如果日期不同(diff > 0)，则获取每个日期差异行的所有(月、年)组合并附加到新数据框

由于输入数据有大约 25000 行，我相信输出数据会非常非常大，所以我正在寻找更多的Pythonic 方式来实现这个(如果可能并且比迭代方法更快)!

最佳答案

在我看来，这里使用的最佳工具是 PeriodIndex(用于生成日期之间的月份和年份)。

但是，PeriodIndex 一次只能对一行进行操作。所以，如果我们要去要使用 PeriodIndex，每一行都必须单独处理。那不幸的是意味着循环遍历数据框:

import pandas as pd
df = pd.DataFrame([('a1','b1','c1','1Jan1990','15Aug1990'),
                   ('a3','b3','c3','11May1986','11May1986'),
                   ('a5','b5','c5','1Dec1984','31Dec2017')],
                  columns=['A','B','C','Date1','Date2'])

result = [] 
for tup in df.itertuples():
    index = pd.PeriodIndex(start=tup.Date1, end=tup.Date2, freq='M')
    new_df = pd.DataFrame([(tup.A, tup.B, tup.C)], index=index)
    new_df['Month'] = new_df.index.month
    new_df['Year'] = new_df.index.year
    result.append(new_df)
result = pd.concat(result, axis=0)
print(result)

产量

          0   1   2  Month  Year
1990-01  a1  b1  c1      1  1990    <--- Beginning of row 1
1990-02  a1  b1  c1      2  1990
1990-03  a1  b1  c1      3  1990
1990-04  a1  b1  c1      4  1990
1990-05  a1  b1  c1      5  1990
1990-06  a1  b1  c1      6  1990
1990-07  a1  b1  c1      7  1990
1990-08  a1  b1  c1      8  1990    <--- End of row 1
1986-05  a3  b3  c3      5  1986    <--- Beginning and End of row 2
1984-12  a5  b5  c5     12  1984    <--- Beginning row 3
1985-01  a5  b5  c5      1  1985
1985-02  a5  b5  c5      2  1985
1985-03  a5  b5  c5      3  1985
1985-04  a5  b5  c5      4  1985
...      ..  ..  ..    ...   ...
2017-09  a5  b5  c5      9  2017
2017-10  a5  b5  c5     10  2017
2017-11  a5  b5  c5     11  2017
2017-12  a5  b5  c5     12  2017    <--- End of row 3

[406 rows x 5 columns]

请注意，您可能真的不需要定义 Month 和 Year 列

new_df['Month'] = new_df.index.month
new_df['Year'] = new_df.index.year

因为您已经有了 PeriodIndex，这使得计算月份和年份变得非常容易。

关于用日期范围填充行的 Pythonic 方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53780270/

24

4

0

文章推荐： Python worker 连接失败

文章推荐： c++ - 如何使用 STL 容器实现库排序算法？

文章推荐： c++ - Opencv中的去噪

文章推荐： c++ - GNU 一次制作 : Makefile Build Multiple . cpp 文件

Ruby 方法() 方法
我想了解 Ruby 方法 methods() 是如何工作的。我尝试使用“ruby 方法”在 Google 上搜索，但这不是我需要的。我也看过 ruby-doc.org，但我没有找到这种方法。
VBS教程：方法-Test 方法
Test 方法对指定的字符串执行一个正则表达式搜索，并返回一个 Boolean 值指示是否找到匹配的模式。 object.Test(string) 参数 object 必选项。总是一个
VBS教程：方法-Replace 方法
Replace 方法替换在正则表达式查找中找到的文本。 object.Replace(string1, string2) 参数 object 必选项。总是一个 RegExp 对象的名称。
VBS教程：方法-Raise 方法
Raise 方法生成运行时错误 object.Raise(number, source, description, helpfile, helpcontext) 参数 object 应为
VBS教程：方法-Execute 方法
Execute 方法对指定的字符串执行正则表达式搜索。 object.Execute(string) 参数 object 必选项。总是一个 RegExp 对象的名称。 string
VBS教程：方法-Clear 方法
Clear 方法清除 Err 对象的所有属性设置。 object.Clear object 应为 Err 对象的名称。说明在错误处理后，使用 Clear 显式地清除 Err 对象。此
VBS教程：方法-CopyFile 方法
CopyFile 方法将一个或多个文件从某位置复制到另一位置。 object.CopyFile source, destination[, overwrite] 参数 object 必选
VBS教程：方法-Copy 方法
Copy 方法将指定的文件或文件夹从某位置复制到另一位置。 object.Copy destination[, overwrite] 参数 object 必选项。应为 File 或 F
VBS教程：方法-Close 方法
Close 方法关闭打开的 TextStream 文件。 object.Close object 应为 TextStream 对象的名称。说明下面例子举例说明如何使用 Close 方
VBS教程：方法-BuildPath 方法
BuildPath 方法向现有路径后添加名称。 object.BuildPath(path, name) 参数 object 必选项。应为 FileSystemObject 对象的名称
VBS教程：方法-GetFolder 方法
GetFolder 方法返回与指定的路径中某文件夹相应的 Folder 对象。 object.GetFolder(folderspec) 参数 object 必选项。应为 FileSy
VBS教程：方法-GetFileName 方法
GetFileName 方法返回指定路径（不是指定驱动器路径部分）的最后一个文件或文件夹。 object.GetFileName(pathspec) 参数 object 必选项。应为
VBS教程：方法-GetFile 方法
GetFile 方法返回与指定路径中某文件相应的 File 对象。 object.GetFile(filespec) 参数 object 必选项。应为 FileSystemObject
VBS教程：方法-GetExtensionName 方法
GetExtensionName 方法返回字符串，该字符串包含路径最后一个组成部分的扩展名。 object.GetExtensionName(path) 参数 object 必选项。应
VBS教程：方法-GetDriveName 方法
GetDriveName 方法返回包含指定路径中驱动器名的字符串。 object.GetDriveName(path) 参数 object 必选项。应为 FileSystemObjec
VBS教程：方法-GetDrive 方法
GetDrive 方法返回与指定的路径中驱动器相对应的 Drive 对象。 object.GetDrive drivespec 参数 object 必选项。应为 FileSystemO
VBS教程：方法-GetBaseName 方法
GetBaseName 方法返回字符串，其中包含文件的基本名 (不带扩展名), 或者提供的路径说明中的文件夹。 object.GetBaseName(path) 参数 object 必
VBS教程：方法-GetAbsolutePathName 方法
GetAbsolutePathName 方法从提供的指定路径中返回完整且含义明确的路径。 object.GetAbsolutePathName(pathspec) 参数 object
VBS教程：方法-FolderExists 方法
FolderExists 方法如果指定的文件夹存在，则返回 True；否则返回 False。 object.FolderExists(folderspec) 参数 object 必选项
VBS教程：方法-FileExists 方法
FileExists 方法如果指定的文件存在返回 True；否则返回 False。 object.FileExists(filespec) 参数 object 必选项。应为 FileS

首页

博学

6Ren·AI

商城

用日期范围填充行的 Pythonic 方法