gpt4 book ai didi

python - Pandas 按日期过滤 CSV

转载 作者:太空宇宙 更新时间:2023-11-03 15:23:18 25 4
gpt4 key购买 nike

Pandas 按日期过滤如何使用日期过滤 CSV

示例 CSV

User    Dates       Hours   shift
User1 01.01.2012 5 aaa
User1 02.01.2012 5 aaa
User1 03.01.2012 2 bbb
User1 04.01.2012 3 aaa
.....
User1 12.03.2012 1 aaa
User1 13.03.2012 8 ccc
.....
User2 04.02.2012 4 aaa
User2 05.02.2012 3 bbb

结束等等

我可以通过用户进行过滤

use = users.loc["User1"]

我还可以总结所有时间

print(use["Hours"].sum()

我可以数他的类次

counts = use.loc[ou['Shift'] == 'aaa', 'Hours'].value_counts()

但我不知道如何按日期和上面的语句进行过滤。例如计算 User2 在 3 月份的所有类次或将 User1 在 2 月份完成的所有小时数相加

或多或少我设法按日期和用户过滤表格

use['Date'] = pd.to_datetime(use['Date'], infer_datetime_format=True, exact=True)
mask = (use['Datum'] > Start) & (use['Date'] <= End)
print(use.loc[mask])

但我不知道如何将它们结合起来。期望的输出

Overview March 2016
User1 made 3 aaa shifts
User1 worked 12h in March 2016

更新:我取得了一些进展

print(use[use['Date'] > '02.01.2012'],['hours'].sum()))

工作正常,但不完全是我想要的。与:

print(use[use['Date'] > '02.01.2012'] & (use[use['Date'] < '02.05.2012'],['hours'].sum()))

我明白

AttributeError: 'list' object has no attribute 'sum'

最佳答案

我认为你可以使用:

Start = '2012-01-01'
End = '2012-03-03'
use['Dates'] = pd.to_datetime(use['Dates'], dayfirst=True)
mask = (use['Dates'] > Start) & (use['Dates'] <= End) & (use['shift'] == 'aaa')
use1 = use.loc[mask]
print (use1)
User Dates Hours shift
1 User1 2012-01-02 5 aaa
3 User1 2012-01-04 3 aaa
6 User2 2012-02-04 4 aaa

use1 = use.query('Dates > @Start and Dates <= @End and shift == "aaa"')
print (use1)
User Dates Hours shift
1 User1 2012-01-02 5 aaa
3 User1 2012-01-04 3 aaa
6 User2 2012-02-04 4 aaa

print (mask.sum())
3
<小时/>
counts = use.loc[mask, 'Hours'].value_counts()
print (counts)
3 1
5 1
4 1
Name: Hours, dtype: int64
<小时/>

编辑:

Start = '2012-01-01'
End = '2012-03-03'
use['Dates'] = pd.to_datetime(use['Dates'], dayfirst=True)
mask = (use['Dates'] > Start) & (use['Dates'] <= End)
use1 = use.loc[mask]
print (use1)
User Dates Hours shift
1 User1 2012-01-02 5 aaa
2 User1 2012-01-03 2 bbb
3 User1 2012-01-04 3 aaa
6 User2 2012-02-04 4 aaa
7 User2 2012-02-05 3 bbb


counts = use1.groupby(['User','shift'])['Hours'].agg({'SUM':'sum', 'COUNT':'size'})
.reset_index()
print (counts)
User shift SUM COUNT
0 User1 aaa 8 2
1 User1 bbb 2 1
2 User2 aaa 4 1
3 User2 bbb 3 1

编辑1:

如果需要更多条件,请使用loc:

print(use.loc[(use['Date'] > '02.01.2012') & (use['Date'] < '02.05.2012'),'hours'].sum())
0

一起:

use = pd.DataFrame({'Date': ['01.01.2012', '02.01.2012', '03.01.2012', '04.01.2012', '12.03.2012', '13.03.2012', '04.02.2012', '05.02.2012'], 'User': ['User1', 'User1', 'User1', 'User1', 'User1', 'User1', 'User2', 'User2'], 'hours': [5, 5, 2, 3, 1, 8, 4, 3], 'shift': ['aaa', 'aaa', 'bbb', 'aaa', 'aaa', 'ccc', 'aaa', 'bbb']})
print (use)

User Date hours shift
0 User1 01.01.2012 5 aaa
1 User1 02.01.2012 5 aaa
2 User1 03.01.2012 2 bbb
3 User1 04.01.2012 3 aaa
4 User1 12.03.2012 1 aaa
5 User1 13.03.2012 8 ccc
6 User2 04.02.2012 4 aaa
7 User2 05.02.2012 3 bbb
<小时/>
Start = '2012-01-01'
End = '2012-01-30'
User = 'User1'
shift = 'aaa'

use['Date'] = pd.to_datetime(use['Date'], dayfirst=True)

#how many Hours by dates (sum)
print(use.loc[(use['Date'] > Start) & (use['Date'] < End),'hours'].sum())
10

#how many Hours by dates and user (sum)
print(use.loc[(use['Date'] > Start) & (use['Date'] < End) &
(use['User'] == User),'hours'].sum())
10

#how many Hours by dates and user (count)
print(((use['Date'] > Start) & (use['Date'] < End) &
(use['User'] == User)).sum())
3

#how many Hours by dates and user and shift (count)
print(((use['Date'] > Start) & (use['Date'] < End) &
(use['User'] == User ) & (use['shift'] == shift)).sum())
2

关于python - Pandas 按日期过滤 CSV,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43344656/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com