gpt4 book ai didi

python - 使用matplotlib的Mongodb数据统计可视化

转载 作者:行者123 更新时间:2023-12-01 06:05:00 24 4
gpt4 key购买 nike

我想使用matplotlib从mongodb中的数据中获取可视化统计信息,但我现在使用的方式真的很奇怪。

我查询了 mongodb 30 次来获取日常数据,这已经很慢而且很脏,特别是当我从其他地方而不是服务器上获取结果时。我想知道是否有更好/干净的方法来获取每小时、每天、每月和每年的统计数据?

这是我现在使用的一些代码(获取每日统计数据):

from datetime import datetime, date, time, timedelta
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from my_conn import my_mongodb

t1 = []
t2 = []
today = datetime.combine(date.today(), time())
with my_mongodb() as m:
for i in range(30):
day = today - timedelta(days = i)
t1 = [m.data.find({"time": {"$gte": day, "$lt": day + timedelta(days = 1)}}).count()] + t1
t2 = [m.data.find({"deleted": 0, "time": {"$gte": day, "$lt": day + timedelta(days = 1)}}).count()] + t2

x = range(30)
N = len(x)

def format_date(x, pos=None):
day = today - timedelta(days = (N - x - 1))
return day.strftime('%m/%d')

plt.bar(range(len(t1)), t1, align='center', color="#4788d2") #All
plt.bar(range(len(t2)), t2, align='center', color="#0c3688") #Not-deleted

plt.xticks(range(len(x)), [format_date(i) for i in x], size='small', rotation=30)
plt.grid(axis = "y")

plt.show()

最佳答案

更新:

我从根本上误解了这个问题。 Felix 正在查询 mongoDB 以找出每个范围内有多少项;因此,我的方法不起作用,因为我试图向 mongoDB 询问这些项目。 Felix拥有大量数据,所以这是完全不合理的。

Felix,这是一个更新的函数,应该可以满足您的需求:

def getDataFromLast(num, quantum):
m = my_mongodb()
all = []
not_deleted = []
today = datetime.combine(date.today(), time())
for i in range(num+1)[-1]: # start from oldest
day = today - i*quantum
time_query = {"$gte":day, "$lt": day+quantum}
all.extend(m.data.find({"time":time_query}).count())
not_deleted.extend(m.data.find({"deleted":0, "time":time_query}).count())
return all, not_deleted

量子是回顾的“步骤”。例如,如果我们想查看最后一个12 小时,我设置 quantum = timedelta(hours=1)num = 12。我们获取过去 30 天的更新示例用法如下:

from datetime import datetime, date, time, timedelta
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from my_conn import my_mongodb

#def getDataFromLast(num, quantum) as defined above

def format_date(x, N, pos=None):
""" This is your format_date function. It now takes N
(I still don't really understand what it is, though)
as an argument instead of assuming that it's a global."""
day = date.today() - timedelta(days=N-x-1)
return day.strftime('%m%d')

def plotBar(data, color):
plt.bar(range(len(data)), data, align='center', color=color)


N = 30 # define the range that we want to look at

all, valid = getDataFromLast(N, timedelta(days=1)) # get the data

plotBar(all, "#4788d2") # plot both deleted and non-deleted data
plotBar(valid, "#0c3688") # plot only the valid data

plt.xticks(range(N), [format_date(i) for i in range(N)], size='small', rotation=30)
plt.grid(axis="y")
plt.show()
<小时/>

原文:

好吧,这是我为您重构的尝试。 Blubber建议学习JS和MapReduce。没必要,只要遵循他的其他建议:在时间字段上创建索引,并减少查询次数。这是我对此的最佳尝试,并进行了一些轻微的重构。不过我有很多问题和意见。

开始于:

with my_mongodb() as m:
for i in range(30):
day = today - timedelta(days = i)
t1 = [m.data.find({"time": {"$gte": day, "$lt": day + timedelta(days = 1)}}).count()] + t1
t2 = [m.data.find({"deleted": 0, "time": {"$gte": day, "$lt": day + timedelta(days = 1)}}).count()] + t2

您正在发出 mongoDB 请求来查找过去 30 天内每一天的所有数据。为什么不只使用一个请求呢?一旦您拥有了所有数据,为什么不直接过滤掉已删除的数据呢?

with my_mongodb() as m:
today = date.today() # not sure why you were combining this with time(). It's the datetime representation of the current time.time()

start_date = today -timedelta(days=30)
t1 = m.find({"time": {"$gte":start_date}}) # all data since start_date (30 days ago)
t2 = filter(lambda x: x['deleted'] == 0, all_data) # all data since start_date that isn't deleted

我真的不确定你为什么发出 60 个请求(30 * 2,一个用于所有数据,一个用于未删除)。您每天建立数据有什么特殊原因吗?

然后,你有:

x = range(30)
N = len(x)

为什么不:

N = 30
x = range(N)

len(range(x) 等于 x,但需要计算时间。您最初编写的方式有点......奇怪。

这是我的破解方法,我建议以尽可能通用的方式进行更改。

from datetime import datetime, date, time, timedelta
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from my_conn import my_mongodb

def getDataFromLast(delta):
""" Delta is a timedelta for however long ago you want to look
back. For instance, to find everything within the last month,
delta should = timedelta(days=30). Last hour? timedelta(hours=1)."""
m = my_mongodb() # what exactly is this? hopefully I'm using it correctly.
today = date.today() # was there a reason you didn't use this originally?
start_date = today - delta
all_data = m.data.find({"time": {"$gte": start_date}})
valid_data = filter(lambda x: x['deleted'] == 0, all) # all data that isn't deleted
return all_data, valid_data

def format_date(x, N, pos=None):
""" This is your format_date function. It now takes N
(I still don't really understand what it is, though)
as an argument instead of assuming that it's a global."""
day = date.today() - timedelta(days=N-x-1)
return day.strftime('%m%d')

def plotBar(data, color):
plt.bar(range(len(data)), data, align='center', color=color)

N = 30 # define the range that we want to look at
all, valid = getDataFromLast(timedelta(days=N))
plotBar(all, "#4788d2") # plot both deleted and non-deleted data
plotBar(valid, "#0c3688") # plot only the valid data

plt.xticks(range(N), [format_date(i) for i in range(N)], size='small', rotation=30)
plt.grid(axis="y")
plt.show()

关于python - 使用matplotlib的Mongodb数据统计可视化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8559080/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com