gpt4 book ai didi

python - 使用 DateTimeIndex 计算 Dataframe 中字符串的出现次数

转载 作者:太空宇宙 更新时间:2023-11-04 09:40:26 25 4
gpt4 key购买 nike

我有一个像这样的时间序列的 DataFrame:

timestamp   v            IceCreamOrder  Location
2018-01-03 02:21:16 Chocolate South
2018-01-03 12:41:12 Vanilla North
2018-01-03 14:32:15 Strawberry North
2018-01-03 15:32:15 Strawberry North
2018-01-04 02:21:16 Strawberry North
2018-01-04 02:21:16 Rasberry North
2018-01-04 12:41:12 Vanilla North
2018-01-05 15:32:15 Chocolate North

我想得到这样的计数:

timestamp   strawberry  chocolate
1/2/14 0 1
1/3/14 2 0
1/4/14 1 0
1/4/14 0 0
1/4/14 0 0
1/5/14 0 1

由于这是时间序列数据,我一直以 pandas datetimeindex 格式存储时间戳。

我首先尝试获取“草莓”的计数。我最终得到了这段不起作用的代码。

mydf = (inputdf.set_index('timestamp').groupby(pd.Grouper(freq = 'D'))['IceCreamOrder'].count('Strawberry'))

导致错误:

TypeError: count() takes 1 positional argument but 2 were given

如有任何帮助,我们将不胜感激。

最佳答案

使用eq (==) 按 string 比较列并聚合 sum 计数 True 值,因为 True 是类似 1 的进程:

#convert to datetimes if necessary
inputdf['timestamp'] = pd.to_datetime(inputdf['timestamp'], format='%m/%d/%y')
print (inputdf)
timestamp IceCreamOrder Location
0 2018-01-02 Chocolate South
1 2018-01-03 Vanilla North
2 2018-01-03 Strawberry North
3 2018-01-03 Strawberry North
4 2018-01-04 Strawberry North
5 2018-01-04 Rasberry North
6 2018-01-04 Vanilla North
7 2018-01-05 Chocolate North

mydf = (inputdf.set_index('timestamp')['IceCreamOrder']
.eq('Strawberry')
.groupby(pd.Grouper(freq = 'D'))
.sum())
print (mydf)
timestamp
2018-01-02 0.0
2018-01-03 2.0
2018-01-04 1.0
2018-01-05 0.0
Freq: D, Name: IceCreamOrder, dtype: float64

如果要计算所有type,请将列IceCreamOrder 添加到groupby 并聚合GroupBy.size :

mydf1 = (inputdf.set_index('timestamp')
.groupby([pd.Grouper(freq = 'D'), 'IceCreamOrder'])
.size())
print (mydf1)
timestamp IceCreamOrder
2018-01-02 Chocolate 1
2018-01-03 Strawberry 2
Vanilla 1
2018-01-04 Rasberry 1
Strawberry 1
Vanilla 1
2018-01-05 Chocolate 1
dtype: int64

mydf1 = (inputdf.set_index('timestamp')
.groupby([pd.Grouper(freq = 'D'),'IceCreamOrder'])
.size()
.unstack(fill_value=0))
print (mydf1)
IceCreamOrder Chocolate Rasberry Strawberry Vanilla
timestamp
2018-01-02 1 0 0 0
2018-01-03 0 0 2 1
2018-01-04 0 1 1 1
2018-01-05 1 0 0 0

如果所有datetime都没有time:

mydf1 = (inputdf.groupby(['timestamp', 'IceCreamOrder'])
.size()
.unstack(fill_value=0))
print (mydf1)
IceCreamOrder Chocolate Rasberry Strawberry Vanilla
timestamp
2018-01-02 1 0 0 0
2018-01-03 0 0 2 1
2018-01-04 0 1 1 1
2018-01-05 1 0 0 0

关于python - 使用 DateTimeIndex 计算 Dataframe 中字符串的出现次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51924332/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com