gpt4 book ai didi

python - 迭代两个 pandas 数据帧之间的日期范围以获取类别计数

转载 作者:太空宇宙 更新时间:2023-11-03 17:07:46 26 4
gpt4 key购买 nike

我有两个 pandas 数据框(df1 和 df2):

df1 有 12 列,其中 a1、a2、...、a9 是空列。以下是 df1 的示例:

Stock Start_Date          End_Date        a1 a2 a3 a4 .... a9
A 09-12-2015 20:04 10-12-2015 23:04
B 09-12-2015 10:04 09-12-2015 20:14
A 11-12-2015 00:22 11-12-2015 08:04
C 08-12-2015 06:56 10-12-2015 20:54

df2 有 4 列。下面是一个示例:

Stock   date_time     Opening   closing
A 09-12-2015 21:24 144.3 10
A 09-12-2015 21:27 225.51 24
B 09-12-2015 10:20 134.42 11
A 09-12-2015 20:04 231.22 17
B 09-12-2015 10:24 399.55 32
A 09-12-2015 20:04 246.77 21
B 09-12-2015 14:22 76.23 8
C 08-12-2015 09:44 232.22 15
C 09-12-2015 20:04 222.91 12
A 11-12-2015 02:06 93.21 7
B 09-12-2015 20:04 211.36 26
C 09-12-2015 20:04 111.21 8

现在,我希望输出是这样的,df1:

Stock   Start_Date       End_Date          a1   a2  a3  a4 ....a9
A 09-12-2015 20:04 10-12-2015 23:04 0 2 2 0 0
B 09-12-2015 10:04 09-12-2015 20:14 1 1 2 0 0
A 11-12-2015 00:22 11-12-2015 08:04 1 0 0 0 0
C 08-12-2015 06:56 10-12-2015 20:54 0 0 0 1 0

即对于 df1 的每个股票、开始日期和结束日期组合,结果应包含 df2 中该日期时间范围内每个类别的计数。

在最终输出中,a1 = count[opening(0-100)&ending(0-10)], a2 = count[opening(101-200)&ending(11-20)], a3 = count[opening( 201-400)&结束(21-50)],a4 = count[开始(0-100)&结束(11-20)]等等,共9种组合。

我有这方面的 R 代码,但对于更大的数据集效果不佳。任何人都可以帮助我如何在 python/pandas 中执行此操作。如有任何帮助,我们将不胜感激!!

最佳答案

您可以尝试这个解决方案,我删除了 df1 的空列,但它也适用于它们:

#merge dataframes by Stock, select datetimes between start and end
df = df1.merge(df2,on='Stock', how='left')
df = df[(df.date_time >= df.Start_Date) & (df.date_time <= df.End_Date)]
#remove column date_time
df = df.drop(['date_time'], axis=1)
print df
# Stock Start_Date End_Date Opening closing
#0 A 2015-09-12 20:04:00 2015-10-12 23:04:00 144.30 10
#1 A 2015-09-12 20:04:00 2015-10-12 23:04:00 225.51 24
#2 A 2015-09-12 20:04:00 2015-10-12 23:04:00 231.22 17
#3 A 2015-09-12 20:04:00 2015-10-12 23:04:00 246.77 21
#5 B 2015-09-12 10:04:00 2015-09-12 20:14:00 134.42 11
#6 B 2015-09-12 10:04:00 2015-09-12 20:14:00 399.55 32
#7 B 2015-09-12 10:04:00 2015-09-12 20:14:00 76.23 8
#8 B 2015-09-12 10:04:00 2015-09-12 20:14:00 211.36 26
#13 A 2015-11-12 00:22:00 2015-11-12 08:04:00 93.21 7
#14 C 2015-08-12 06:56:00 2015-10-12 20:54:00 232.22 15
#15 C 2015-08-12 06:56:00 2015-10-12 20:54:00 222.91 12
#16 C 2015-08-12 06:56:00 2015-10-12 20:54:00 111.21 8

#values to new columns by conditions - cast boolean to integers
df['a1'] = ((df.Opening.between(0,100)) & (df.closing.between(0,10))).astype(int)
df['a2'] = ((df.Opening.between(100,200)) & (df.closing.between(11,20))).astype(int)
#add other columns like a1 and a2
print df
# Stock Start_Date End_Date Opening closing a1 a2
#0 A 2015-09-12 20:04:00 2015-10-12 23:04:00 144.30 10 0 0
#1 A 2015-09-12 20:04:00 2015-10-12 23:04:00 225.51 24 0 0
#2 A 2015-09-12 20:04:00 2015-10-12 23:04:00 231.22 17 0 0
#3 A 2015-09-12 20:04:00 2015-10-12 23:04:00 246.77 21 0 0
#5 B 2015-09-12 10:04:00 2015-09-12 20:14:00 134.42 11 0 1
#6 B 2015-09-12 10:04:00 2015-09-12 20:14:00 399.55 32 0 0
#7 B 2015-09-12 10:04:00 2015-09-12 20:14:00 76.23 8 1 0
#8 B 2015-09-12 10:04:00 2015-09-12 20:14:00 211.36 26 0 0
#13 A 2015-11-12 00:22:00 2015-11-12 08:04:00 93.21 7 1 0
#14 C 2015-08-12 06:56:00 2015-10-12 20:54:00 232.22 15 0 0
#15 C 2015-08-12 06:56:00 2015-10-12 20:54:00 222.91 12 0 0
#16 C 2015-08-12 06:56:00 2015-10-12 20:54:00 111.21 8 0 0

#groupby and sum rows
df= df.groupby(['Stock', 'Start_Date', 'End_Date']).sum()
df = df.drop(['Opening', 'closing'], axis=1)
print df.reset_index()
# Stock Start_Date End_Date a1 a2
#0 A 2015-09-12 20:04:00 2015-10-12 23:04:00 0 0
#1 A 2015-11-12 00:22:00 2015-11-12 08:04:00 1 0
#2 B 2015-09-12 10:04:00 2015-09-12 20:14:00 1 1
#3 C 2015-08-12 06:56:00 2015-10-12 20:54:00 0 0

关于python - 迭代两个 pandas 数据帧之间的日期范围以获取类别计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34433886/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com