gpt4 book ai didi

python-3.x - Pandas 数据框 : how to count the number of times a variable is repeated in 1 minute

转载 作者:行者123 更新时间:2023-12-04 02:50:57 25 4
gpt4 key购买 nike

我有以下数据框片段:

Full dataframe:                   ip      time      cik  crawler
ts
2019-03-11 00:00:01 71.155.177.ide 00:00:01 1262327 0.0
2019-03-11 00:00:02 71.155.177.ide 00:00:02 1262329 0.0
2019-03-11 00:00:05 69.243.218.cah 00:00:05 751200 0.0
2019-03-11 00:00:08 172.173.121.efb 00:00:08 881890 0.0
2019-03-11 00:00:09 216.254.60.idd 00:00:09 1219169 0.0
2019-03-11 00:00:09 64.18.197.gjc 00:00:09 1261705 0.0
2019-03-11 00:00:09 64.18.197.gjc 00:00:09 1261734 0.0
2019-03-11 00:00:10 64.18.197.gjc 00:00:10 1263094 0.0
2019-03-11 00:00:10 64.18.197.gjc 00:00:10 1264242 0.0
2019-03-11 00:00:10 64.18.197.gjc 00:00:10 1264242 0.0

我想按 IP 分组,然后使用一些函数来计数:

1) 1 分钟内每个 IP 有多少个唯一 CIK

2) 1 分钟内每个 IP 有多少 CIK(总共)。

我已经尝试过 resample 函数,但我不知道如何让它按照我想要的方式计数。我的代码如下:

dataframe = pd.read_csv(path + "log20060702.csv", usecols=['cik', 'ip', 'time', 'crawler'])
dataframe = dataframe[dataframe['crawler'] == 0]
dataframe['cik'] = pd.to_numeric(dataframe['cik'], downcast='integer')
dataframe['ts'] = pd.to_datetime((dataframe['time']))

dataframe = dataframe.set_index(['ts'])
print("Full dataframe: ", dataframe.head(10))

df_dict = dataframe.groupby("ip")
counter = 0
for key, df_values in df_dict:
counter += 1
print("df values: ", df_values)
# df_values = df_values.resample("5T").count()
if counter == 5:
break

或者,如果有人可以告诉我如何按 IP 分组,并且每 1 分钟分组一次,剩下的我可以自己做。我不一定要寻找完整的解决方案,一些指导将不胜感激。

最佳答案

groupbyDataFrameGroupBy.resample 结合使用和聚合SeriesGroupBy.nuniqueDataFrameGroupBy.size 计数:

df = dataframe.groupby("ip").resample('1Min')['cik'].agg(['nunique','size'])
print (df)
nunique size
ip ts
172.173.121.efb 2019-03-11 1 1
216.254.60.idd 2019-03-11 1 1
64.18.197.gjc 2019-03-11 4 5
69.243.218.cah 2019-03-11 1 1
71.155.177.ide 2019-03-11 2 2

或者使用Grouper :

df = dataframe.groupby(["ip", pd.Grouper(freq='1Min')])['cik'].agg(['nunique','size'])
print (df)
nunique size
ip ts
172.173.121.efb 2019-03-11 1 1
216.254.60.idd 2019-03-11 1 1
64.18.197.gjc 2019-03-11 4 5
69.243.218.cah 2019-03-11 1 1
71.155.177.ide 2019-03-11 2 2

关于python-3.x - Pandas 数据框 : how to count the number of times a variable is repeated in 1 minute,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55102153/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com