gpt4 book ai didi

python - pandas 可以对 RollingGroupby 对象上的字符串类型列进行计数吗?

转载 作者:行者123 更新时间:2023-12-01 08:24:08 25 4
gpt4 key购买 nike

根据问题标题。 pandas 可以依赖 RollingGroupby 对象上的字符串类型列吗?

这是我的数据框:

# Let's say my objective is to count the number of unique cars 
# over the last 1 day grouped by park

park | date | to_count
------------------------------
A | 2019-01-01 | Honda
A | 2019-01-03 | Lexus
A | 2019-01-05 | BMW
A | 2019-01-05 | Lexus
B | 2019-01-01 | BMW
B | 2019-01-08 | Lexus
B | 2019-01-08 | Lexus
B | 2019-01-10 | Ford

这就是我想要的:

 park |    date    | unique_count
----------------------------------
A | 2019-01-01 | 1
A | 2019-01-03 | 1
A | 2019-01-05 | 2
B | 2019-01-01 | 1
B | 2019-01-08 | 1
B | 2019-01-10 | 1

# Bit of explanation:
# There are 2 type of cars coming to park A over last 1 day on 5th Jan so distinct count is 2.
# There are 2 cars of 1 type (Lexus) coming to park B over last 1 day on 8th Jan so distinct count is 1.

这是我尝试过的:

import pandas as pd
import numpy as np

# initiate dataframe
df = pd.DataFrame({
'park': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'date': ['2019-01-01', '2019-01-03', '2019-01-05', '2019-01-05',
'2019-01-01', '2019-01-08', '2019-01-08', '2019-01-10'],
'to_count': ['Honda', 'Lexus', 'BMW', 'Lexus', 'BMW', 'Lexus', 'Lexus', 'Ford']
})

# string to date
df['date'] = pd.to_datetime(df['date'])

# group. This is more intuitive to me but sadly this does not work.
unique_count = df.groupby('park').rolling('1d', on='date').to_count.nunique()

# factorize then group. This works (but why???)
df['factorized'] = pd.factorize(df.to_count)[0]
unique_count = df.groupby('park').rolling('1d', on='date').factorized.apply(lambda x: len(np.unique(x)) )

result = unique_count.reset_index().drop_duplicates(subset=['park', 'date'], keep='last')

这是我的环境:

  • Mac 10.12 High Sierra
  • python3.6
  • Pandas 0.22.0

强调一下,我需要滚动窗口功能才能工作。在此示例中,窗口恰好为 1 天,但我可能希望它工作 3 天、7 天、2 小时 5 秒。

最佳答案

试试这个:
- 首先,按 parkdate 对数据框进行分组
- 按唯一值的数量聚合to_count

df = pd.DataFrame({
'park': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'date': ['2019-01-01', '2019-01-03', '2019-01-05', '2019-01-05',
'2019-01-01', '2019-01-08', '2019-01-08', '2019-01-10'],
'to_count': ['Honda', 'Lexus', 'BMW', 'Lexus', 'BMW', 'Lexus', 'Lexus', 'Ford']
})

agg_df = df.groupby(by=['park', 'date']).agg({'to_count': pd.Series.nunique}).reset_index()

关于python - pandas 可以对 RollingGroupby 对象上的字符串类型列进行计数吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54413686/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com