Search for multiple values within the hierarchy of the same specified folder in a flat representation of a tree structure in a Postgres jsonb table(在postgres jsonb表的树结构的平面表示中，在同一指定文件夹的层次结构中搜索多个值)-6ren

Search for multiple values within the hierarchy of the same specified folder in a flat representation of a tree structure in a Postgres jsonb table(在postgres jsonb表的树结构的平面表示中，在同一指定文件夹的层次结构中搜索多个值)

转载作者：bug小助手更新时间：2023-10-28 13:05:49

Data structure:
The table has an PK 'id' and a jsonb column 'data'. The 'data' contains an array of objects 'instances'. Each 'instance' has some values and a 'path' array. The 'path' array is a flat representation of a deeply nested tree like hierarchical structure. Each 'path' consists of objects that have a string 'id', which is not unique, and an integer 'index' which is only unique on the same level (the same parent structure),
but can repeat on different levels.

数据结构：该表有一个pk‘id’和一个jsonb列‘data’。‘data’包含一个对象‘实例’数组。每个“实例”都有一些值和一个“路径”数组。“路径”数组是深度嵌套的树状层次结构的平面表示。每个“路径”由具有字符串“id”和整数“index”的对象组成，前者不是唯一的，后者在同一级别(相同的父结构)上是唯一的，但可以在不同的级别上重复。

Example:

    "instances": [
        {
        "path": [
            {"id": "root", "index": 2},
            {"id": "folder1", "index": 0},
            {"id": "folder2", "index": 0},
            {"id": "folder3", "index": 0}
            ],
        "pdf": "pdf in 1,2,3",
        "info": "some other data"
        },
        ...
    ]

I need to be able to search for multiple specific values within the hierarchy of the same specified folder in a Postgres jsonb table.

我需要能够在postgres jsonb表中同一指定文件夹的层次结构中搜索多个特定值。

For example, search for an item that has both "pdf in 1,2,3" AND "text in 1,2,3" values within the hierarchy of the same folder2 (meaning the folder2 within the same parent structure).

例如，搜索在相同文件夹2的层次结构中同时具有“pdf in 1，2，3”和“Text in 1，2，3”值的项目(指同一父结构中的文件夹2)。

Here's the query I came up with:

以下是我想出的查询：

    WITH indexed_paths AS ( 
        SELECT id, instance -> 'index' as inst_idx,
        MIN(CASE WHEN path_element @> '{"id": "folder2"}' THEN path_idx END)
            OVER (PARTITION BY id, instance -> 'index') AS searched_index,
        path_idx, path_element, instance
        FROM "flat",
             jsonb_array_elements(data -> 'instances') AS instance,
             jsonb_array_elements(instance -> 'path') WITH ORDINALITY arr(path_element, path_idx)
        WHERE
        instance -> 'path' @> '[{"id": "folder2"}]'
        ORDER BY id, inst_idx, path_idx
    ), combined_paths AS ( 
        SELECT id, jsonb_agg(path_element) as path, instance
        FROM indexed_paths
        WHERE path_idx <= searched_index
        GROUP BY id, inst_idx, instance
    ), combined_instances AS ( 
        SELECT id, path, jsonb_agg(instance) as instances
        FROM combined_paths
        GROUP BY id, path
    )
    SELECT * 
    FROM "flat" f
    WHERE EXISTS (
        SELECT 1
        FROM combined_instances ci
        WHERE
            ci.id = f.id
            AND ci.instances @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb
    );

1. indexed_paths CTE expands each row's instances and each path of each instance
  into a separate row of 'path_element's, enumerating all the path_element's with indexes.
  If the path_element is the searched one, it takes it's index and writes it
  to a new column of all the rows with a matching id and instance index.
  if there are multiple occurrences of the searched path_element within the same id and instance index
  it takes the smallest one.
  INDEX_PATHS CTE将每行的实例和每个实例的每个路径扩展到单独的一行‘Path_Element’中，枚举具有索引的所有Path_Element。如果Path_Element是搜索到的路径元素，它将获取它的索引并将其写入具有匹配id和实例索引的所有行的新列中。如果搜索到的Path_Element在相同ID和实例索引中多次出现，它将采用最小的一个。
2. combined_paths CTE aggregates path_elements grouping them by id's and instance index, checking if
  the element_path index is <= to the searched index, this way reconstructing element_paths back but only up to
  the searched path_element.
  组合路径CTE聚合路径元素，按id和实例索引对它们进行分组，检查元素路径索引是否<=到搜索到的索引，这样可以重新构建元素路径，但只返回到搜索到的路径_元素。
3. combined_instances CTE aggregates the data of all instances by matching id's and reconstructed paths.
  组合实例CTE通过匹配id和重建的路径来聚合所有实例的数据。
4. final SELECT statement is searching for the specified data inside the combined_instances and joins it
  with the original table by id's.*
  最后一个SELECT语句在combined_instances中搜索指定的数据，并通过id将其与原始表连接。*

It works exactly the way I want, but it's just too verbose and long. Is there any way to simplify it? Changing the data structure of the jsonb column is an option. Some other algotythm is also welcome. Basically any haelp would be gretly appreciated.

它完全按照我想要的方式工作，但它太冗长了。有什么方法可以简化它吗？更改jsonb列的数据结构是一种选择。一些其他的算法也是受欢迎的。基本上，任何帮助都会受到高度赞赏。

更多回答

I generally find using subqueries in a SELECT clause easier to parse/understand/reason about than using CTEs that first expand and then group items, and in my experience they've also been shorter, but maybe that's just me

我通常发现，在SELECT子句中使用子查询比使用先展开然后分组项的CTE更容易解析/理解/推理，而且根据我的经验，它们也更短，但可能这只是我的情况

So you don't know the path to the "folder2" yet? And there might even be multiple "folder2"s anywhere in the tree structure of each item, you just want to search for items that have any folder named "folder2" that contains both the searched values?

这么说你还不知道通向“文件夹2”的路？甚至可能在每个项目的树结构中的任何位置都有多个“Folder2”S，您只想搜索具有任何名为“Folder2”的文件夹的项目，该文件夹包含两个搜索值？

Yup, exactly that. Imagine that a user is trying to searh for a foledr2 on a hard drive that contains a pdf file and a txt file somewhere within it's hierarchy.

是的，就是这样。假设用户试图在硬盘上搜索包含pdf文件和txt文件的文件夹r2，该文件夹位于其层次结构中的某个位置。

Given you're searching for rows in the flat table, and each row contains many instances, wouldn't the analogy rather be to search for one of many hard drives, that contains such a folder? That's why I was a bit surprised

假设您要在平面表格中搜索行，并且每行包含许多实例，难道不应该从包含这样一个文件夹的多个硬盘驱动器中搜索一个吗？这就是为什么我有点惊讶的原因

Well yeah. You can think of each row as a hard drive, or a root folder that contains the rest of it. it's just an example so doesn't really matter

嗯，是的。您可以将每一行视为硬盘驱动器，或包含其其余部分的根文件夹。这只是一个例子，所以这并不重要

优秀答案推荐

I'd try to use subqueries and lateral joins more instead of expanding and re-grouping rows in multiple CTEs:

我会尝试更多地使用子查询和横向连接，而不是在多个CTE中扩展和重新分组行：

SELECT id, to_jsonb(ancestor_path) AS ancestor_path, instances
FROM "flat" f,
LATERAL (
  SELECT element->>'id' AS ancestor_name, path[0:ancestor.idx] AS ancestor_path, jsonb_agg(instance) AS instances
  FROM jsonb_array_elements(f.data -> 'instances') AS instance,
  jsonb_to_record(instance) AS _i(path jsonb[]),
  unnest(path) WITH ORDINALITY AS ancestor(element, idx)
  GROUP BY path[0:ancestor.idx], element->>'id'
) AS data
WHERE ancestor_name = 'folder2'
  AND instances @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb

^{(online demo of this and the below approaches)}

(此方法和以下方法的在线演示)

or with EXISTS, if you want only the whole flat row (regardless how many matches there are within its data):

或WITH EXISTS，如果您只想要整个平整行(无论其数据中有多少匹配项)：

SELECT *
FROM "flat" f
WHERE EXISTS (
  SELECT 1
  FROM jsonb_array_elements(f.data -> 'instances') AS instance,
  jsonb_to_record(instance) AS _i(path jsonb[]),
  unnest(path) WITH ORDINALITY AS ancestor(element, idx)
  WHERE element->>'id' = 'folder2'
  GROUP BY path[0:ancestor.idx]
  HAVING jsonb_agg(instance) @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb
)

An alternative to the GROUP BY and searching in the aggregated instances array would be a self-join of a CTE relation:

在聚合实例数组中进行GROUP BY和搜索的替代方法是CTE关系的自联接：

SELECT *
FROM "flat" f
WHERE EXISTS (
  WITH instances AS (
    SELECT value, element->>'id' AS ancestor_name, path[0:ancestor.idx] AS ancestor_path
    FROM jsonb_array_elements(f.data -> 'instances') AS el(value),
    jsonb_to_record(value) AS _i(path jsonb[]),
    unnest(path) WITH ORDINALITY AS ancestor(element, idx)
  )
  SELECT *
  FROM instances a JOIN instances b USING (ancestor_path, ancestor_name)
  WHERE ancestor_name = 'folder2'
    AND a.value @> '{"pdf": "pdf in 1,2,3"}'
    AND b.value @> '{"jpg": "jpg in 1,2,3"}'
)

Instead of returning ancestor_name as a separate column from the subquery, which makes the GROUP BY or JOIN … USING more ugly, you can also just access the last element of ancestor_path:

而不是将ANSTESTOR_NAME作为子查询的单独列返回，这会使GROUP BY或JOIN成为…使用MORE GUGLE，您还可以只访问ANSTESOR_PATH的最后一个元素：

SELECT *
FROM "flat" f,
LATERAL (
  SELECT to_jsonb(path[0:ancestor.idx]) AS ancestor_path, jsonb_agg(instance) AS instances
  FROM jsonb_array_elements(f.data -> 'instances') AS instance,
  jsonb_to_record(instance) AS _i(path jsonb[]),
  unnest(path) WITH ORDINALITY AS ancestor(element, idx)
  GROUP BY path[0:ancestor.idx], element->>'id'
) AS data
WHERE ancestor_path->-1->>'id' = 'folder2'
  AND instances @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb

I guess the main trick that makes these queries shorter than yours is the use of array slicing to generate the ancestor path(s) and the use of jsonb_to_record for converting to jsonb array to a postgres array. This probably could have been achieved in any number of ways, including your window function (which notably only finds the upper path if there are two nested "folder2"s):

我猜使这些查询比您的更短的主要技巧是使用数组切片来生成祖先路径(S)，并使用jsonb_to_record将jsonb数组转换为postgres数组。这可能已经通过多种方式实现了，包括您的窗口函数(它只在有两个嵌套的“文件夹2”S的情况下才会找到上面的路径)：

SELECT id, ancestor_path, instances
FROM "flat" f,
LATERAL (
  SELECT (
    SELECT jsonb_agg(element) FILTER (WHERE idx <= searched_index)
    FROM (
      SELECT *, MIN(idx) FILTER (WHERE element @> '{"id": "folder2"}') OVER () AS searched_index
      FROM jsonb_array_elements(instance -> 'path') WITH ORDINALITY anc(element, idx)
    ) AS path
  ) AS ancestor_path, jsonb_agg(instance) AS instances
  FROM jsonb_array_elements(f.data -> 'instances') AS instance
  GROUP BY ancestor_path
) AS data
WHERE instances @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb

更多回答

This is just brilliant! I can't thank you enough. And here I was, thinking I came up with a good solution :/ Thank you very much.

这真是太棒了！我无法表达对您的感谢。于是我想到了一个好的解决方案：/非常感谢。

文章推荐： android webview 地理定位

文章推荐： go - 从 FileInfo 打开文件

文章推荐： android - SearchView 的 OnCloseListener 不起作用

javascript - 我需要将文本放在一个中，它位于一个 Div 中，该 Div 位于另一个 Div 中，该 Div 位于另一个 Div 中
我需要将文本放在中在一个 Div 中，在另一个 Div 中，在另一个 Div 中。所以这是它的样子: #document Change PIN
html - 两个背景图像。一个在 HTML 中，一个在 BODY 中。在 Firefox 中，主体图像未呈现
奇怪的事情发生了。我有一个基本的 html 代码。 html，头部， body 。(因为我收到了一些反对票，这里是完整的代码) 这是我的CSS: html { backgroun
ios - 将图像从 asset.xcassets 加载到 imageArray 中，并将其动态加载到 UIImageView 中，该 UIImageView 存在于 UICollectionView 中 - swift
我正在尝试将 Assets 中的一组图像加载到 UICollectionview 中存在的 ImageView 中，但每当我运行应用程序时它都会显示错误。而且也没有显示图像。我在ViewDidLoa
linux - 在 BASH 中，我需要根据 perl 脚本的输出更改一些环境变量。在 tcsh 中，我可以使用别名 eval 组合。不能在 bash 中
我需要根据带参数的 perl 脚本的输出更改一些环境变量。在 tcsh 中，我可以使用别名命令来评估 perl 脚本的输出。 tcsh: alias setsdk 'eval `/localhome/
asp.net - Windows 身份验证适用于 IIS，但不适用于 Kestrel/Microsoft.AspNetCore.Authentication.Negotiate(不在 Chrome 中，有时在 Edge 中，始终在 IE 中)？
我使用 Windows 身份验证创建了一个新的 Blazor(服务器端)应用程序，并使用 IIS Express 运行它。它将显示一条消息“Hello Domain\User!”来自右上方的以下 Ra
java - java 中 Kotlin 中的等价物是什么？
这是我的方法 void login(Event event);我想知道 Kotlin 中应该如何最佳答案在 Kotlin 中通配符运算符是 * 。它指示编译器它是未知的，但一旦知道，就不会有其他类
express - 在 Jade 中，为什么有时我可以按原样使用变量而有时必须将它们包含在#{......} 中？
看下面的代码 for story in book if story.title.length < 140 - var story
c - C 中 strstr() 中 for 循环的错误使用
我正在尝试用 C 语言学习字符串处理。我写了一个程序，它存储了一些音乐轨道，并帮助用户检查他/她想到的歌曲是否存在于存储的轨道中。这是通过要求用户输入一串字符来完成的。然后程序使用 strstr()
c - * 在 sscanf 中，* 在 [] 中
我正在学习 sscanf 并遇到如下格式字符串: sscanf("%[^:]:%[^*=]%*[*=]%n",a,b,&c); 我理解 %[^:] 部分意味着扫描直到遇到 ':' 并将其分配给 a。:
python - 在 Python (2.7.3) 中，如果 str(x) 中的任何字符在 str(y) 中(或 str(y) 在 str(x) 中)，我如何编写一个函数来回答？
def char_check(x,y): if (str(x) in y or x.find(y) > -1) or (str(y) in x or y.find(x) > -1):
ansible - 在 Ansible 中，如何将一行移动到一个 block 中？
我有一种情况，我想将文本文件中的现有行包含到一个新 block 中。 line 1 line 2 line in block line 3 line 4 应该变成 line 1 line 2 line
Django 调试工具栏显示在根 URL 中，但不显示在应用程序 URL 中
我有一个新项目，我正在尝试设置 Django 调试工具栏。首先，我尝试了快速设置，它只涉及将 'debug_toolbar' 添加到我的已安装应用程序列表中。有了这个，当我转到我的根 URL 时，调试
r - 在 R 中，Matlab 中 @ 函数句柄的等价物是什么？
在 Matlab 中，如果我有一个函数 f，例如签名是 f(a,b,c)，我可以创建一个只有一个变量 b 的函数，它将使用固定的 a=a1 和 c=c1 调用 f: g = @(b) f(a1, b,
swiftui - SwiftUI 中 ScrollView 中 VStack 元素中的神秘间距或填充
我不明白为什么 ForEach 中的元素之间有多余的垂直间距在 VStack 里面在 ScrollView 里面使用 GeometryReader 时渲染自定义水平分隔线。 Scrol
cookies - 什么应该存储在 session 中，什么应该存储在 cookie 中？
我想知道，是否有关于何时使用 session 和 cookie 的指南或最佳实践？什么应该和什么不应该存储在其中？谢谢! 最佳答案这些文档很好地了解了 session cookie 的安全问题以及
python - Python 中 matplotlib 中 3d 直方图的奇怪行为
我在 scipy/numpy 中有一个 Nx3 矩阵，我想用它制作一个 3 维条形图，其中 X 轴和 Y 轴由矩阵的第一列和第二列的值、高度确定每个条形的是矩阵中的第三列，条形的数量由 N 确定。
c - c 中 sem_init(...) 中 value 参数的不同用法
假设我用两种不同的方式初始化信号量 sem_init(&randomsem,0,1) sem_init(&randomsem,0,0) 现在， sem_wait(&randomsem) 在这两种情况下
c - 实际值存储在 pstr 中，但是该值如何存储在数组 "WORD"中
我怀疑该值如何存储在“WORD”中，因为 PStr 包含实际输出。？既然Pstr中存储的是小写到大写的字母，那么在printf中如何将其给出为“WORD”。有人可以吗？解释一下？ #include
javascript - 数组索引选择像在 numpy 中，但在 javascript 中
我有一个 3x3 数组: var my_array = [[0,1,2], [3,4,5], [6,7,8]]; 并想获得它的第一个 2
javascript - 在 Javascript 中，如何检测浏览器窗口何时在 View 中？
我意识到您可以使用如下方式轻松检查焦点: var hasFocus = true; $(window).blur(function(){ hasFocus = false; }); $(win

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Search for multiple values within the hierarchy of the same specified folder in a flat representation of a tree structure in a Postgres jsonb table(在postgres jsonb表的树结构的平面表示中，在同一指定文件夹的层次结构中搜索多个值)