gpt4 book ai didi

python - 如何在一行中计算数据框中的并发事件?

转载 作者:太空宇宙 更新时间:2023-11-03 14:59:24 26 4
gpt4 key购买 nike

我有一个电话数据集。我想计算每条记录有多少个事件调用。我找到了这个 question但我想避免循环和函数。

每个通话都有一个日期、一个开始时间和一个结束时间

数据框:

      start       end        date
0 09:17:12 09:18:20 2016-08-10
1 09:15:58 09:17:42 2016-08-11
2 09:16:40 09:17:49 2016-08-11
3 09:17:05 09:18:03 2016-08-11
4 09:18:22 09:18:30 2016-08-11

我想要的:

      start       end        date  activecalls
0 09:17:12 09:18:20 2016-08-10 1
1 09:15:58 09:17:42 2016-08-11 1
2 09:16:40 09:17:49 2016-08-11 2
3 09:17:05 09:18:03 2016-08-11 3
4 09:18:22 09:18:30 2016-08-11 1

我的代码:

import pandas as pd

df = pd.read_clipboard(sep='\s\s+')

df['activecalls'] = df[(df['start'] <= df.loc[df.index]['start']) & \
(df['end'] > df.loc[df.index]['start']) & \
(df['date'] == df.loc[df.index]['date'])].count()

print(df)

我得到的:

      start       end        date  activecalls
0 09:17:12 09:18:20 2016-08-10 NaN
1 09:15:58 09:17:42 2016-08-11 NaN
2 09:16:40 09:17:49 2016-08-11 NaN
3 09:17:05 09:18:03 2016-08-11 NaN
4 09:18:22 09:18:30 2016-08-11 NaN

最佳答案

您可以使用:

#convert time and date to datetime
df['date_start'] = pd.to_datetime(df.start + ' ' + df.date)
df['date_end'] = pd.to_datetime(df.end + ' ' + df.date)
#remove columns
df = df.drop(['start','end','date'], axis=1)

循环解决方案:

active_events= []
for i in df.index:
active_events.append(len(df[(df["date_start"]<=df.loc[i,"date_start"]) &
(df["date_end"]> df.loc[i,"date_start"])]))
df['activecalls'] = pd.Series(active_events)
print (df)
date_start date_end activecalls
0 2016-08-10 09:17:12 2016-08-10 09:18:20 1
1 2016-08-11 09:15:58 2016-08-11 09:17:42 1
2 2016-08-11 09:16:40 2016-08-11 09:17:49 2
3 2016-08-11 09:17:05 2016-08-11 09:18:03 3
4 2016-08-11 09:18:22 2016-08-11 09:18:30 1

解决方案 merge

#cross join
df['tmp'] = 1
df1 = pd.merge(df,df.reset_index(),on=['tmp'])
df = df.drop('tmp', axis=1)
#print (df1)

#filtering by conditions
df1 = df1[(df1["date_start_x"]<=df1["date_start_y"])
(df1["date_end_x"]> df1["date_start_y"])]
print (df1)
date_start_x date_end_x activecalls_x tmp index \
0 2016-08-10 09:17:12 2016-08-10 09:18:20 1 1 0
6 2016-08-11 09:15:58 2016-08-11 09:17:42 1 1 1
7 2016-08-11 09:15:58 2016-08-11 09:17:42 1 1 2
8 2016-08-11 09:15:58 2016-08-11 09:17:42 1 1 3
12 2016-08-11 09:16:40 2016-08-11 09:17:49 2 1 2
13 2016-08-11 09:16:40 2016-08-11 09:17:49 2 1 3
18 2016-08-11 09:17:05 2016-08-11 09:18:03 3 1 3
24 2016-08-11 09:18:22 2016-08-11 09:18:30 1 1 4

date_start_y date_end_y activecalls_y
0 2016-08-10 09:17:12 2016-08-10 09:18:20 1
6 2016-08-11 09:15:58 2016-08-11 09:17:42 1
7 2016-08-11 09:16:40 2016-08-11 09:17:49 2
8 2016-08-11 09:17:05 2016-08-11 09:18:03 3
12 2016-08-11 09:16:40 2016-08-11 09:17:49 2
13 2016-08-11 09:17:05 2016-08-11 09:18:03 3
18 2016-08-11 09:17:05 2016-08-11 09:18:03 3
24 2016-08-11 09:18:22 2016-08-11 09:18:30 1
#get size - active calls
print (df1.groupby(['index'], sort=False).size())
index
0 1
1 1
2 2
3 3
4 1
dtype: int64

df['activecalls'] = df1.groupby('index').size()
print (df)
date_start date_end activecalls
0 2016-08-10 09:17:12 2016-08-10 09:18:20 1
1 2016-08-11 09:15:58 2016-08-11 09:17:42 1
2 2016-08-11 09:16:40 2016-08-11 09:17:49 2
3 2016-08-11 09:17:05 2016-08-11 09:18:03 3
4 2016-08-11 09:18:22 2016-08-11 09:18:30 1

时间:

def a(df):
active_events= []
for i in df.index:
active_events.append(len(df[(df["date_start"]<=df.loc[i,"date_start"]) & (df["date_end"]> df.loc[i,"date_start"])]))
df['activecalls'] = pd.Series(active_events)
return (df)

def b(df):
df['tmp'] = 1
df1 = pd.merge(df,df.reset_index(),on=['tmp'])
df = df.drop('tmp', axis=1)
df1 = df1[(df1["date_start_x"]<=df1["date_start_y"]) & (df1["date_end_x"]> df1["date_start_y"])]
df['activecalls'] = df1.groupby('index').size()
return (df)

print (a(df))
print (b(df))

In [160]: %timeit (a(df))
100 loops, best of 3: 6.76 ms per loop

In [161]: %timeit (b(df))
The slowest run took 4.42 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 4.61 ms per loop

关于python - 如何在一行中计算数据框中的并发事件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39467341/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com