gpt4 book ai didi

java - 从 SQL 表中进行统计

转载 作者:行者123 更新时间:2023-12-02 12:17:39 24 4
gpt4 key购买 nike

我的数据库中有一个表,我可以通过这种方式注册多个传感器的读数:

CREATE TABLE [test].[readings] (
[timestamp_utc] DATETIME2(0) NOT NULL, -- 48bits
[sensor_id] INT NOT NULL, -- 32 bits
[site_id] INT NOT NULL, -- 32 bits
[reading] REAL NOT NULL, -- 64 bits
PRIMARY KEY([timestamp_utc], [sensor_id], [site_id])
)

CREATE TABLE [test].[sensors] (
[sensor_id] int NOT NULL ,
[measurement_type_id] int NOT NULL,
[site_id] int NOT NULL ,
[description] varchar(255) NULL ,
PRIMARY KEY ([sensor_id], [site_id])
)

我想轻松地对所有这些读数进行统计。

我想做的一些查询:

Get me all readings for site_id = X between date_hour1 and date_hour2

Get me all readings for site_id = X and sensor_id in <list> between date_hour1 and date_hour2

Get me all readings for site_id = X and sensor measurement type = Z between date_hour1 and date_hour2

Get me all readings for site_id = X, aggregated (average) by DAY between date_hour1 and date_hour2

Get me all readings for site_id = X, aggregated (average) by DAY between date_hour1 and date_hour2 but in UTC+3 (这应该给出与之前查询不同的结果,因为现在日期的开始和结束移动了 3 小时)

Get me min, max, std, mean for all readings for site_id = X between date_hour1 and date_hour2

到目前为止,我一直在使用 Java 来查询数据库并在本地执行所有这些处理。但这最终会有点慢,并且代码的编写和维护变得一团糟(太多的循环、执行重复任务的通用函数、大/冗长的代码库等)...

更糟糕的是,表 readings是巨大的(因此主键的重要性,它也是一个性能指标),也许我应该为此使用 TimeSeries 数据库(有什么好的数据库吗?)。我正在使用 SQL Server。

执行此操作的最佳方法是什么?我觉得我正在重新发明轮子,因为所有这些都有点像分析应用程序......

我知道这些查询听起来很简单,但是当您尝试参数化所有这些时,您最终可能会得到这样的怪物:

-- Sums all device readings, returns timestamps in localtime according to utcOffset (if utcOffset = 00:00, then timestamps are in UTC)
CREATE PROCEDURE upranking.getSumOfReadingsForDevices
@facilityId int,
@deviceIds varchar(MAX),
@beginTS datetime2,
@endTS datetime2,
@utcOffset varchar(6),
@resolution varchar(6) -- NO, HOURS, DAYS, MONTHS, YEARS
AS BEGIN
SET NOCOUNT ON -- http://stackoverflow.com/questions/24428928/jdbc-sql-error-statement-did-not-return-a-result-set
DECLARE @deviceIdsList TABLE (
id int NOT NULL
);

DECLARE @beginBoundary datetime2,
@endBoundary datetime2;

SELECT @beginBoundary = DATEADD(day, -1, @beginTS);
SELECT @endBoundary = DATEADD(day, 1, @endTS);

-- We shift sign from the offset because we are going to convert the zone for the entire table and not beginTS endTS themselves
SELECT @utcOffset = CASE WHEN LEFT(@utcOffset, 1) = '+' THEN STUFF(@utcOffset, 1, 1, '-') ELSE STUFF(@utcOffset, 1, 1, '+') END

INSERT INTO @deviceIdsList
SELECT convert(int, value) FROM string_split(@deviceIds, ',');

SELECT SUM(reading) as reading,
timestamp_local
FROM (
SELECT reading,
upranking.add_timeoffset_to_datetime2(timestamp_utc, @utcOffset, @resolution) as timestamp_local
FROM upranking.readings
WHERE
device_id IN (SELECT id FROM @deviceIdsList)
AND facility_id = @facilityId
AND timestamp_utc BETWEEN @beginBoundary AND @endBoundary
) as innertbl
WHERE timestamp_local BETWEEN @beginTS AND @endTS
GROUP BY timestamp_local
ORDER BY timestamp_local
END
GO

This is a query that receives the site id (facilityId in this case), the list of sensor ids (the deviceIds in this case), the beginning and the ending timestamps, followed by their UTC offset in a string like "+xx:xx" or "-xx:xx", terminating with the resolution which will basically say how the result will be aggregated by SUM (taking the UTC offset into consideration).

由于我使用的是Java,乍一看我可以使用 Hibernate 之类的东西,但我觉得 Hibernate 并不是为这些类型的查询而设计的。

最佳答案

你的结构乍一看看起来不错,但看看你的查询,我觉得你可能想尝试一些调整。性能从来都不是一个简单的话题,找到“一刀切的答案”并不容易。这里有一些注意事项:

  1. 您想要更好的读取写入性能吗?如果您想要更好的读取性能,您需要重新考虑您的索引。当然,您有一个主键,但大多数查询都没有使用它(所有三个字段)。尝试为[sensor_id]、[site_id]创建索引。
  2. 可以使用缓存吗?如果某些搜索是重复出现的,并且您的应用是数据库的单一入口点,那么请评估您的用例是否会从缓存中受益。
  3. 如果表读数很大,则考虑使用某种分区策略。查看 MSSQL documentation
  4. 如果您不需要实时数据,请尝试某种搜索引擎,例如 Elastic Search

关于java - 从 SQL 表中进行统计,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46044734/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com