gpt4 book ai didi

Search for multiple values within the hierarchy of the same specified folder in a flat representation of a tree structure in a Postgres jsonb table(在postgres jsonb表的树结构的平面表示中,在同一指定文件夹的层次结构中搜索多个值)

转载 作者:bug小助手 更新时间:2023-10-28 13:05:49 28 4
gpt4 key购买 nike



Data structure:
The table has an PK 'id' and a jsonb column 'data'. The 'data' contains an array of objects 'instances'. Each 'instance' has some values and a 'path' array. The 'path' array is a flat representation of a deeply nested tree like hierarchical structure. Each 'path' consists of objects that have a string 'id', which is not unique, and an integer 'index' which is only unique on the same level (the same parent structure),
but can repeat on different levels.

数据结构:该表有一个pk‘id’和一个jsonb列‘data’。‘data’包含一个对象‘实例’数组。每个“实例”都有一些值和一个“路径”数组。“路径”数组是深度嵌套的树状层次结构的平面表示。每个“路径”由具有字符串“id”和整数“index”的对象组成,前者不是唯一的,后者在同一级别(相同的父结构)上是唯一的,但可以在不同的级别上重复。


Example:

    "instances": [
{
"path": [
{"id": "root", "index": 2},
{"id": "folder1", "index": 0},
{"id": "folder2", "index": 0},
{"id": "folder3", "index": 0}
],
"pdf": "pdf in 1,2,3",
"info": "some other data"
},
...
]

I need to be able to search for multiple specific values within the hierarchy of the same specified folder in a Postgres jsonb table.

我需要能够在postgres jsonb表中同一指定文件夹的层次结构中搜索多个特定值。


For example, search for an item that has both "pdf in 1,2,3" AND "text in 1,2,3" values within the hierarchy of the same folder2 (meaning the folder2 within the same parent structure).

例如,搜索在相同文件夹2的层次结构中同时具有“pdf in 1,2,3”和“Text in 1,2,3”值的项目(指同一父结构中的文件夹2)。


Here's the query I came up with:

以下是我想出的查询:


    WITH indexed_paths AS ( 
SELECT id, instance -> 'index' as inst_idx,
MIN(CASE WHEN path_element @> '{"id": "folder2"}' THEN path_idx END)
OVER (PARTITION BY id, instance -> 'index') AS searched_index,
path_idx, path_element, instance
FROM "flat",
jsonb_array_elements(data -> 'instances') AS instance,
jsonb_array_elements(instance -> 'path') WITH ORDINALITY arr(path_element, path_idx)
WHERE
instance -> 'path' @> '[{"id": "folder2"}]'
ORDER BY id, inst_idx, path_idx
), combined_paths AS (
SELECT id, jsonb_agg(path_element) as path, instance
FROM indexed_paths
WHERE path_idx <= searched_index
GROUP BY id, inst_idx, instance
), combined_instances AS (
SELECT id, path, jsonb_agg(instance) as instances
FROM combined_paths
GROUP BY id, path
)
SELECT *
FROM "flat" f
WHERE EXISTS (
SELECT 1
FROM combined_instances ci
WHERE
ci.id = f.id
AND ci.instances @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb
);




    1. indexed_paths CTE expands each row's instances and each path of each instance
      into a separate row of 'path_element's, enumerating all the path_element's with indexes.
      If the path_element is the searched one, it takes it's index and writes it
      to a new column of all the rows with a matching id and instance index.
      if there are multiple occurrences of the searched path_element within the same id and instance index
      it takes the smallest one.

      INDEX_PATHS CTE将每行的实例和每个实例的每个路径扩展到单独的一行‘Path_Element’中,枚举具有索引的所有Path_Element。如果Path_Element是搜索到的路径元素,它将获取它的索引并将其写入具有匹配id和实例索引的所有行的新列中。如果搜索到的Path_Element在相同ID和实例索引中多次出现,它将采用最小的一个。



    2. combined_paths CTE aggregates path_elements grouping them by id's and instance index, checking if
      the element_path index is <= to the searched index, this way reconstructing element_paths back but only up to
      the searched path_element.

      组合路径CTE聚合路径元素,按id和实例索引对它们进行分组,检查元素路径索引是否<=到搜索到的索引,这样可以重新构建元素路径,但只返回到搜索到的路径_元素。



    3. combined_instances CTE aggregates the data of all instances by matching id's and reconstructed paths.

      组合实例CTE通过匹配id和重建的路径来聚合所有实例的数据。



    4. final SELECT statement is searching for the specified data inside the combined_instances and joins it
      with the original table by id's.*

      最后一个SELECT语句在combined_instances中搜索指定的数据,并通过id将其与原始表连接。*






It works exactly the way I want, but it's just too verbose and long. Is there any way to simplify it? Changing the data structure of the jsonb column is an option. Some other algotythm is also welcome. Basically any haelp would be gretly appreciated.

它完全按照我想要的方式工作,但它太冗长了。有什么方法可以简化它吗?更改jsonb列的数据结构是一种选择。一些其他的算法也是受欢迎的。基本上,任何帮助都会受到高度赞赏。


更多回答

I generally find using subqueries in a SELECT clause easier to parse/understand/reason about than using CTEs that first expand and then group items, and in my experience they've also been shorter, but maybe that's just me

我通常发现,在SELECT子句中使用子查询比使用先展开然后分组项的CTE更容易解析/理解/推理,而且根据我的经验,它们也更短,但可能这只是我的情况

So you don't know the path to the "folder2" yet? And there might even be multiple "folder2"s anywhere in the tree structure of each item, you just want to search for items that have any folder named "folder2" that contains both the searched values?

这么说你还不知道通向“文件夹2”的路?甚至可能在每个项目的树结构中的任何位置都有多个“Folder2”S,您只想搜索具有任何名为“Folder2”的文件夹的项目,该文件夹包含两个搜索值?

Yup, exactly that. Imagine that a user is trying to searh for a foledr2 on a hard drive that contains a pdf file and a txt file somewhere within it's hierarchy.

是的,就是这样。假设用户试图在硬盘上搜索包含pdf文件和txt文件的文件夹r2,该文件夹位于其层次结构中的某个位置。

Given you're searching for rows in the flat table, and each row contains many instances, wouldn't the analogy rather be to search for one of many hard drives, that contains such a folder? That's why I was a bit surprised

假设您要在平面表格中搜索行,并且每行包含许多实例,难道不应该从包含这样一个文件夹的多个硬盘驱动器中搜索一个吗?这就是为什么我有点惊讶的原因

Well yeah. You can think of each row as a hard drive, or a root folder that contains the rest of it. it's just an example so doesn't really matter

嗯,是的。您可以将每一行视为硬盘驱动器,或包含其其余部分的根文件夹。这只是一个例子,所以这并不重要

优秀答案推荐

I'd try to use subqueries and lateral joins more instead of expanding and re-grouping rows in multiple CTEs:

我会尝试更多地使用子查询和横向连接,而不是在多个CTE中扩展和重新分组行:


SELECT id, to_jsonb(ancestor_path) AS ancestor_path, instances
FROM "flat" f,
LATERAL (
SELECT element->>'id' AS ancestor_name, path[0:ancestor.idx] AS ancestor_path, jsonb_agg(instance) AS instances
FROM jsonb_array_elements(f.data -> 'instances') AS instance,
jsonb_to_record(instance) AS _i(path jsonb[]),
unnest(path) WITH ORDINALITY AS ancestor(element, idx)
GROUP BY path[0:ancestor.idx], element->>'id'
) AS data
WHERE ancestor_name = 'folder2'
AND instances @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb

(online demo of this and the below approaches)

(此方法和以下方法的在线演示)


or with EXISTS, if you want only the whole flat row (regardless how many matches there are within its data):

或WITH EXISTS,如果您只想要整个平整行(无论其数据中有多少匹配项):


SELECT *
FROM "flat" f
WHERE EXISTS (
SELECT 1
FROM jsonb_array_elements(f.data -> 'instances') AS instance,
jsonb_to_record(instance) AS _i(path jsonb[]),
unnest(path) WITH ORDINALITY AS ancestor(element, idx)
WHERE element->>'id' = 'folder2'
GROUP BY path[0:ancestor.idx]
HAVING jsonb_agg(instance) @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb
)

An alternative to the GROUP BY and searching in the aggregated instances array would be a self-join of a CTE relation:

在聚合实例数组中进行GROUP BY和搜索的替代方法是CTE关系的自联接:


SELECT *
FROM "flat" f
WHERE EXISTS (
WITH instances AS (
SELECT value, element->>'id' AS ancestor_name, path[0:ancestor.idx] AS ancestor_path
FROM jsonb_array_elements(f.data -> 'instances') AS el(value),
jsonb_to_record(value) AS _i(path jsonb[]),
unnest(path) WITH ORDINALITY AS ancestor(element, idx)
)
SELECT *
FROM instances a JOIN instances b USING (ancestor_path, ancestor_name)
WHERE ancestor_name = 'folder2'
AND a.value @> '{"pdf": "pdf in 1,2,3"}'
AND b.value @> '{"jpg": "jpg in 1,2,3"}'
)

Instead of returning ancestor_name as a separate column from the subquery, which makes the GROUP BY or JOIN … USING more ugly, you can also just access the last element of ancestor_path:

而不是将ANSTESTOR_NAME作为子查询的单独列返回,这会使GROUP BY或JOIN成为…使用MORE GUGLE,您还可以只访问ANSTESOR_PATH的最后一个元素:


SELECT *
FROM "flat" f,
LATERAL (
SELECT to_jsonb(path[0:ancestor.idx]) AS ancestor_path, jsonb_agg(instance) AS instances
FROM jsonb_array_elements(f.data -> 'instances') AS instance,
jsonb_to_record(instance) AS _i(path jsonb[]),
unnest(path) WITH ORDINALITY AS ancestor(element, idx)
GROUP BY path[0:ancestor.idx], element->>'id'
) AS data
WHERE ancestor_path->-1->>'id' = 'folder2'
AND instances @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb

I guess the main trick that makes these queries shorter than yours is the use of array slicing to generate the ancestor path(s) and the use of jsonb_to_record for converting to jsonb array to a postgres array. This probably could have been achieved in any number of ways, including your window function (which notably only finds the upper path if there are two nested "folder2"s):

我猜使这些查询比您的更短的主要技巧是使用数组切片来生成祖先路径(S),并使用jsonb_to_record将jsonb数组转换为postgres数组。这可能已经通过多种方式实现了,包括您的窗口函数(它只在有两个嵌套的“文件夹2”S的情况下才会找到上面的路径):


SELECT id, ancestor_path, instances
FROM "flat" f,
LATERAL (
SELECT (
SELECT jsonb_agg(element) FILTER (WHERE idx <= searched_index)
FROM (
SELECT *, MIN(idx) FILTER (WHERE element @> '{"id": "folder2"}') OVER () AS searched_index
FROM jsonb_array_elements(instance -> 'path') WITH ORDINALITY anc(element, idx)
) AS path
) AS ancestor_path, jsonb_agg(instance) AS instances
FROM jsonb_array_elements(f.data -> 'instances') AS instance
GROUP BY ancestor_path
) AS data
WHERE instances @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb

更多回答

This is just brilliant! I can't thank you enough. And here I was, thinking I came up with a good solution :/ Thank you very much.

这真是太棒了!我无法表达对您的感谢。于是我想到了一个好的解决方案:/非常感谢。

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com