gpt4 book ai didi

SQL 查询到 'collapse' 靠在一起的行

转载 作者:行者123 更新时间:2023-12-04 15:37:34 24 4
gpt4 key购买 nike

我们有一个应用程序可以捕获用户进行的搜索。由于我们搜索的性质(我们在几个字符后提供结果)和人们打字的速度,我们正在为每个搜索/字母获取一个日志条目。这看起来像这样:

Search log

(它看起来像一棵颠倒的圣诞树...)

我们内部需要此数据来计算搜索次数(又名 API 调用),但为了向我们的客户报告,报告“一半”查询并不是很好。

我正在寻找一种方法将这些行折叠成具有最长/最后一个搜索词的行。

有个问题:一个用户 (cid) 可以在一个 session 中进行超过 1 次搜索,但我猜如果我们查看时间戳,我们可以将其分开。

它必须是这样的:

1) 将间隔不超过 2 秒的行分组

2) Order by length (or last) 查询得到最终查询

3) 按术语分组以统计某个术语用于报告的频率

文本数据:

2019-12-09  2019-12-09 12:58:45 5dea585477c94502b52c43fb    92cd6cef-3ed8-4416-ac2d-cc347780b976    search  1   search  query   vacuum cleaner
2019-12-09 2019-12-09 12:58:45 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum cleane
2019-12-09 2019-12-09 12:58:44 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum clean
2019-12-09 2019-12-09 12:58:43 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum clea
2019-12-09 2019-12-09 12:58:43 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum cle
2019-12-09 2019-12-09 12:58:42 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum cl
2019-12-09 2019-12-09 12:58:41 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum c
2019-12-09 2019-12-09 12:58:40 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum
2019-12-09 2019-12-09 12:58:39 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuu
2019-12-09 2019-12-09 12:58:38 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacu
2019-12-09 2019-12-09 12:58:37 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vac
2019-12-09 2019-12-09 12:58:15 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue widget
2019-12-09 2019-12-09 12:58:14 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue widge
2019-12-09 2019-12-09 12:58:13 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue widg
2019-12-09 2019-12-09 12:58:12 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue wid
2019-12-09 2019-12-09 12:58:12 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue wi
2019-12-09 2019-12-09 12:58:11 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue w
2019-12-09 2019-12-09 12:58:10 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue
2019-12-09 2019-12-09 12:58:09 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blu
2019-12-09 2019-12-09 12:57:38 5dea585477c94502b52c43fb f96305d9-590b-4a10-95a2-2d49a9fc63a3 search 1 search query widget
2019-12-09 2019-12-09 12:57:37 5dea585477c94502b52c43fb f96305d9-590b-4a10-95a2-2d49a9fc63a3 search 1 search query widge
2019-12-09 2019-12-09 12:57:36 5dea585477c94502b52c43fb f96305d9-590b-4a10-95a2-2d49a9fc63a3 search 1 search query widg
2019-12-09 2019-12-09 12:57:35 5dea585477c94502b52c43fb f96305d9-590b-4a10-95a2-2d49a9fc63a3 search 1 search query wid

预期结果:

vacuum cleaner  1
blue widget 1
widget 1

最佳答案

一个看起来像倒置的圣诞树的日志文件;假设最后一个条目位于顶部可能是安全的,因此如果有人可以获取每个组的第一条记录,那么您的问题就会得到解决,包括该人拼写错误然后更正的情况。这使得日期和时间与找到解决方案几乎无关。

假设您的字段“cid”代表一次搜索,这里是一个基于您的数据的解决方案,它产生的结果完全符合您的预期。输入数据应位于名为“inputTable”的表中,创建方式为:

CREATE TABLE `inputTable` (
`date` datetime DEFAULT NULL,
`ts` datetime DEFAULT NULL,
`tid` varchar(255) DEFAULT NULL,
`cid` varchar(255) DEFAULT NULL,
`xid` varchar(255) DEFAULT NULL,
`xvar` int(11) DEFAULT NULL,
`ec` varchar(255) DEFAULT NULL,
`ea` varchar(255) DEFAULT NULL,
`ev` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

在不同的场景下运行以下命令以查找可能无法正常工作且需要微调的特殊情况。

DROP PROCEDURE IF EXISTS SP_cursor1;
DELIMITER $$
CREATE PROCEDURE `SP_cursor1`()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE tempCid VARCHAR(255);

DECLARE _date DATETIME;
DECLARE _ts DATETIME;
DECLARE _tid VARCHAR(255);
DECLARE _cid VARCHAR(255);
DECLARE _xid VARCHAR(255);
DECLARE _xvar INT;
DECLARE _ec VARCHAR(255);
DECLARE _ea VARCHAR(255);
DECLARE _ev VARCHAR(255);

DECLARE cursor1 CURSOR for
SELECT * FROM inputTable;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

OPEN cursor1;
SET TempCid = "";
DROP TABLE IF EXISTS tempTbl;
CREATE TABLE tempTbl AS
SELECT * FROM inputTable WHERE 1 = 0;
read_loop: LOOP
fetch cursor1 into _date, _ts, _tid, _cid, _xid, _xvar, _ec, _ea, _ev;
IF done THEN
LEAVE read_loop;
END IF;
IF tempCid <> _cid then
INSERT INTO tempTbl
SELECT _date, _ts, _tid, _cid, _xid, _xvar, _ec, _ea, _ev;
SET tempCid = _cid;
END IF;
END LOOP;
CLOSE cursor1;
SELECT ev FROM tempTbl;
DROP TABLE tempTbl;
END$$

call SP_cursor1();

希望这对您有所帮助。

关于SQL 查询到 'collapse' 靠在一起的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59249974/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com