perl - 从复杂的数据库提取 (SQL) 构建树状哈希 (YAML)？-6ren

perl - 从复杂的数据库提取 (SQL) 构建树状哈希 (YAML)？

转载作者：行者123 更新时间：2023-12-04 12:53:21

介绍

考虑到这个问题末尾给出的表格，我想要一个算法或一个简单的解决方案，它从 YAML 返回嵌套树。描述。使用 yaml格式是可选的需要。事实上，我需要的输出是有序哈希数组，可能包含也可能不包含嵌套有序哈希或有序哈希数组。

简而言之，我说的是树状结构。

为了更好地理解我的问题，我将处理一个涵盖我所有需求的简单示例。实际上这个例子是我用来实现这个算法的例子。

由于我对 Perl 的了解有限，我决定在我自己的调查的同时提出这个问题。我不想挖错隧道，这就是我寻求帮助的原因。

我目前专注于 DBI 模块。我尝试查看其他模块，例如 DBIx::Tree::NestedSet ，但我不认为这是我需要的。

所以，让我们来看看我的例子的细节。

例子

最初的想法是写一个perl采用 yaml 的程序描述并输出提取的数据。

此输入描述遵循简单的规则:

查询是我们正在寻找的数据。它可以包含以下键

sql 是 SQL 查询

隐藏从最终输出中隐藏列。当仅在子查询中需要列但最终不需要时使用此字段。

子查询 是对父查询的每一行执行的嵌套查询

绑定(bind) 将列值绑定(bind)到 sql查询

哈希告诉程序将结果分组，而不是散列数组，而是散列散列。其实这可以直接给DBI::selectall_hashref .如果省略此字段，则输出被列为有序哈希数组。

关键是与父级结果同级列出的键名。我们会看到
稍后，键名可以屏蔽结果列。

列表告诉程序将结果列出到一个数组中。请注意，只能显示一列，即 array: name显示 names 的列表

连接是 DBI 连接字符串

格式是输出格式。可以是 XML , YAML或 JSON .我主要关注YAML格式，因为它可以很容易地翻译。省略时，默认输出为 YAML .

缩进一个标识有多少个空格。 tabs或 tab值也受支持。

此外，我们知道在 Perl 中哈希是没有顺序的。在这里，输出键的顺序很重要，应该像它们出现在 sql 查询中一样出现。

从这里我简单地使用 YAML模块 :(

总之，最后我们将只执行这个命令:

$ cat desc.yml | ./fetch > data.yml

desc.yml描述如下:

---
connect: "dbi:SQLite:dbname=einstein-puzzle.sqlite"
ident: 4
query:
   - sql: SELECT * from people
     hide:
        - pet_id
        - house_id
        - id
     subquery:
        - key: brevage
          bind: id
          sql: |
                SELECT name, calories, potassium FROM drink
                LEFT JOIN people_has_drink ON drink.id = people_has_drink.id_drink
                WHERE people_has_drink.id_people = 1       
          hash:
             - name
        - key: house
          sql: SELECT color as paint, size, id from house WHERE id = ?
          hide: id
          bind: paint
          subquery:
             - key: color
               sql: SELECT name, ral, hex from color WHERE short LIKE ?
               bind: color
        - key: pet
          sql: SELECT name from pet WHERE id = ?
          bind: pet_id
          list: name

预期产出

根据上面的描述，输出数据将是这样的:

---
- nationality: Norvegian
  smoke: Dunhill
  brevage:
      orange juice:
          calories: 45
          potassium: 200 mg
      water:
          calories: 0
          potassium: 3 mg
  house:
      color:
          name: Zinc yellow
          ral:  RAL 1018
          hex:  #F8F32B
      paint: yellow
      size: small
  pet:
      - cats
- nationality: Brit
  smoke: Pall Mall
  brevage:
      milk:
          calories: 42
          potassium: 150 mg
  house:
      color:
          name: Vermilion
          ral:  RAL 2002
          hex:  #CB2821
      paint: red
      size: big
  pet:
      - birds
      - phasmatodea

我在哪里

我仍然没有完全实现嵌套查询。我目前的状态在这里给出:

#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
use DBI;
use YAML;
use Data::Dumper;
use Tie::IxHash;

# Read configuration and databse connection
my %yml = %{ Load(do { local $/; <DATA>}) };
my $dbh = DBI->connect($yml{connect});

# Fill the bind values of the first query with command-line information
my %bind;
for(@ARGV) {
    next unless /--(\w+)=(.*)/;
    $bind{$1} = $2;
}

my $q0 = $yml{query}[0];
if ($q0->{bind} and keys %bind > 0) {
    $q0->{bind_values} = arrayref($q0->{bind});
    $q0->{bind_values}[$_] = $bind{$q0->{bind}[$_]} foreach (0 .. @{$q0->{bind}} - 1);
}

# Fetch all data from the database recursively
my $out = fetch($q0);

sub fetch {
    # As long we have a query, one processes it
    my $query = shift; 
    return undef unless $query;

    $query->{bind_values} = [] unless ref $query->{bind_values} eq 'ARRAY';
    # Execute SQL query
    my $sth = $dbh->prepare($query->{sql});
    $sth->execute(@{$query->{bind_values}});
    my @columns = @{$sth->{NAME}}; 

    # Fetch all the current level's data and preserve columns order 
    my @return;
    for my $row (@{$sth->fetchall_arrayref()}) { 
        my %data;
        tie %data, 'Tie::IxHash';
        $data{$columns[$_]} = $row->[$_] for (0 .. $#columns);
        for my $subquery (@{ $query->{subquery} }) { 
            my @bind;
            push @bind, $data{$_} for (@{ arrayref($subquery->{bind}) });
            $subquery->{bind_values} = \@bind;
            my $sub = fetch($subquery);

            # Present output as a list 
            if ($subquery->{list}) {
                #if ( map ( $query->{list} eq $_ , keys $sub ) )
                my @list;
                for (@$sub) {
                    push @list, $_->{$subquery->{list}};
                }
                $sub = \@list;
            }

            if ($subquery->{key}) {
                $data{$subquery->{key}} = $sub;
            } else {
                die "[Error] Key is missing for query '$subquery->{sql}'";
            }
        }

        # Remove unwanted columns from the output
        if ($query->{hide}) {
            delete $data{$_} for( @{ arrayref($query->{hide}) } );
        }        

        push @return, \%data;
    }

    \@return;    
}

DumpYaml($out);

sub arrayref {
   my $ref = shift;
   return (ref $ref ne 'ARRAY') ? [$ref] : $ref; 
}

sub DumpYaml {
    # I am not happy with this current dumper. I cannot specify the indent and it does 
    # not preserve the extraction order
    print Dump shift;
}

__DATA__
---
connect: "dbi:SQLite:dbname=einstein-puzzle.sqlite"
ident: 4
query:
   - sql: SELECT * from people
     hide:
        - pet_id
        - house_id
        - id
     subquery:
        - key: brevage
          bind: id
          sql: |
                SELECT name, calories, potassium FROM drink
                LEFT JOIN people_has_drink ON drink.id = people_has_drink.id_drink
                WHERE people_has_drink.id_people = ?       
          hash:
             - name
        - key: house
          sql: SELECT color as paint, size, id from house WHERE id = ?
          hide: id
          bind: house_id
          subquery:
             - key: color
               sql: SELECT short, ral, hex from color WHERE short LIKE ?
               bind: paint
        - key: pet
          sql: SELECT name from pet WHERE id = ?
          bind: pet_id
          list: name

这就是我得到的输出:

---
- brevage:
    - calories: 0
      name: water
      potassium: 3 mg
    - calories: 45
      name: orange juice
      potassium: 200 mg
  house:
    - color:
        - hex: '#F8F32B'
          ral: RAL 1018
          short: yellow
      paint: yellow
      size: small
  nationality: Norvegian
  pet:
    - cats
  smoke: Dunhill
- brevage:
    - calories: 42
      name: milk
      potassium: 150 mg
  house:
    - color:
        - hex: '#CB2821'
          ral: RAL 2002
          short: red
      paint: red
      size: big
  nationality: Brit
  pet:
    - birds
    - phasmatodea
  smoke: Pall Mall

数据库

我的测试数据库是 sqlite db 表如下:

餐 table 人

.----+-------------+----------+--------+-----------.
| id | nationality | house_id | pet_id | smoke     |
+----+-------------+----------+--------+-----------+
|  1 | Norvegian   |        4 |      3 | Dunhill   |
|  2 | Brit        |        1 |      2 | Pall Mall |
'----+-------------+----------+--------+-----------'

餐 table 饮品

.----+--------------+----------+-----------.
| id | name         | calories | potassium |
+----+--------------+----------+-----------+
|  1 | tea          |        1 | 18 mg     |
|  2 | coffee       |        0 | 49 mg     |
|  3 | milk         |       42 | 150 mg    |
|  4 | beer         |       43 | 27 mg     |
|  5 | water        |        0 | 3 mg      |
|  6 | orange juice |       45 | 200 mg    |
'----+--------------+----------+-----------'

表人有饮料

.-----------+----------.
| id_people | id_drink |
+-----------+----------+
|         1 |        5 |
|         1 |        6 |
|         2 |        3 |
'-----------+----------'

table 屋

+----+--------+--------+
| id | color  |  size  |
+----+--------+--------+
|  1 | red    | big    |
|  2 | green  | small  |
|  3 | white  | middle |
|  4 | yellow | small  |
|  5 | blue   | huge   |
+----+--------+--------+

表格颜色

.--------+-------------+----------+---------.
| short  |    color    |   ral    |   hex   |
+--------+-------------+----------+---------+
| red    | Vermilion   | RAL 2002 | #CB2821 |
| green  | Pale green  | RAL 6021 | #89AC76 |
| white  | Light grey  | RAL 7035 | #D7D7D7 |
| yellow | Zinc yellow | RAL 1018 | #F8F32B |
| blue   | Capri blue  | RAL 5019 | #1B5583 |
'--------+-------------+----------+---------'

餐 table 宠物

+----+-------------+
| id |    name     |
+----+-------------+
|  1 | dogs        |
|  2 | birds       |
|  3 | cats        |
|  4 | horses      |
|  5 | fishes      |
|  2 | phasmatodea |
+----+-------------+

数据库数据

如果您希望使用与我相同的数据，也可以为您提供所需的一切:

BEGIN TRANSACTION;
CREATE TABLE "pet" (
    `id`    INTEGER,
    `name`  TEXT
);
INSERT INTO `pet` VALUES (1,'dogs');
INSERT INTO `pet` VALUES (2,'birds');
INSERT INTO `pet` VALUES (3,'cats');
INSERT INTO `pet` VALUES (4,'horses');
INSERT INTO `pet` VALUES (5,'fishes');
INSERT INTO `pet` VALUES (2,'phasmatodea');
CREATE TABLE `people_has_drink` (
    `id_people` INTEGER NOT NULL,
    `id_drink`  INTEGER NOT NULL,
    PRIMARY KEY(id_people,id_drink)
);
INSERT INTO `people_has_drink` VALUES (1,5);
INSERT INTO `people_has_drink` VALUES (1,6);
INSERT INTO `people_has_drink` VALUES (2,3);
CREATE TABLE "people" (
    `id`    INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    `nationality`   VARCHAR(45),
    `house_id`  INT,
    `pet_id`    INT,
    `smoke` VARCHAR(45)
);
INSERT INTO `people` VALUES (1,'Norvegian',4,3,'Dunhill');
INSERT INTO `people` VALUES (2,'Brit',1,2,'Pall Mall');
CREATE TABLE "house" (
    `id`    INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    `color` TEXT,
    `size`  TEXT
);
INSERT INTO `house` VALUES (1,'red','big');
INSERT INTO `house` VALUES (2,'green','small');
INSERT INTO `house` VALUES (3,'white','middle');
INSERT INTO `house` VALUES (4,'yellow','small');
INSERT INTO `house` VALUES (5,'blue','huge');
CREATE TABLE `drink` (
    `id`    INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    `name`  TEXT,
    `calories`  INTEGER,
    `potassium` TEXT
);
INSERT INTO `drink` VALUES (1,'tea',1,'18 mg');
INSERT INTO `drink` VALUES (2,'coffee',0,'49 mg');
INSERT INTO `drink` VALUES (3,'milk',42,'150 mg');
INSERT INTO `drink` VALUES (4,'beer',43,'27 mg');
INSERT INTO `drink` VALUES (5,'water',0,'3 mg');
INSERT INTO `drink` VALUES (6,'orange juice',45,'200 mg');
CREATE TABLE `color` (
    `short` TEXT UNIQUE,
    `color` TEXT,
    `ral`   TEXT,
    `hex`   TEXT,
    PRIMARY KEY(short)
);
INSERT INTO `color` VALUES ('red','Vermilion','RAL 2002','#CB2821');
INSERT INTO `color` VALUES ('green','Pale green','RAL 6021','#89AC76');
INSERT INTO `color` VALUES ('white','Light grey','RAL 7035','#D7D7D7');
INSERT INTO `color` VALUES ('yellow','Zinc yellow','RAL 1018','#F8F32B');
INSERT INTO `color` VALUES ('blue','Capri blue','RAL 5019','#1B5583');
COMMIT;

最佳答案

Is my implementation good

这是一个相当广泛的问题，答案可能取决于您希望从代码中得到什么。例如:

它有效吗？它是否具有您需要的所有功能？它做你想做的事吗？它是否对您想要满足的所有输入范围(以及您不想要的输入)做出适当的响应？如果你不确定，写一些 tests .

速度够快吗？如果不是，那么慢的位是什么？使用 Devel::NYTProf找到他们。

如果它工作正常，您可能还想将您的代码转换为 module而不仅仅是一个脚本，所以你可以再次使用它。

and if not (I'm supposing that I am doing all wrong), what modules should I use to get the desired behavior?

听起来很像您正在尝试执行 DBIx::Class 之类的操作。 (又名 DBIC)当您向 prefetch 询问时会这样做;它将为您构建对象的数据结构。

如果您需要动态地执行此操作以响应任意数据库和 YAML，那并不是 DBIC 的设计初衷；这可能是可能的，但可能会涉及您动态创建包，这并不容易。

关于perl - 从复杂的数据库提取 (SQL) 构建树状哈希 (YAML)？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30683972/

文章推荐： opengl - 使用 OpenGL/GLSL 的 HDR Bloom 效果渲染管线

文章推荐： karma-runner - 构建用于 karma 测试的 webpack 配置

java - 如何使用 Ruby、PHP 或 Java 解析/提取/提取 ASP.net 网站内容？
我正在做一个业余爱好项目，使用 Ruby、PHP 或 Java 来抓取 ASP.net 网站的内容。例如，如果网站 url“www.myaspnet.com/home.aspx”。我想从 home.a
r - 提取/之间的字符串
如果我有这些字符串： mystrings <- c("X2/D2/F4", "X10/D9/F4", "X3/D22/F4",
regex - 提取 | 之间的最后一个单词|
我有以下数据集 > head(names$SAMPLE_ID) [1] "Bacteria|Proteobacteria|Gammaproteobacteria|Pseudomonadales|Mor
grails - 提取: 'join'被忽略
设置: 3个域类A，B和C。A和B在插件中。 C在依赖于此插件的应用程序中。 class A{ B b static mapping = { b fetch: 'joi
JAVA StAX 提取
我不知道如何提取 XML 文件中的开始标记元素名称。我很接近〜意味着没有错误，我正在获取标签名称，但我正在获取标签名称加上信息。我得到的是: {http://www.publishing.org}au
regex - 提取 "?"之后的文本
我有一个字符串 x <- "Name of the Student? Michael Sneider" 我想从中提取“Michael Sneider”。我用过: str_extract_all(x,
Java - 提取 [* ... *] 之间的所有内容
我有一个如下所示的文本文件: [* content I want *] [ more content ] 我想读取该文件并能够提取我想要的内容。我能做的最好的事情如下，但它会返回 [更多内容] 请注意
Twig 提取 FOR 循环变量
假设我有一个项目集合 $collection = array( 'item1' => array( 'post' => $post, 'ca
java - 读取一个文本文件并写入多个文本文件以进行过滤/提取
我正在寻找一种过滤文本文件的方法。我有许多文件夹名称，其中包含许多文本文件，文本文件有几个没有人员，每个人员有 10 个群集/组(我在这里只显示了 3 个)。但是每个组/簇可能包含几个原语(我在这里展
python - Unicode 提取
我已经编写了一个从某个网页中提取网址的代码，我面临的问题是它不会以网页上相同的方式提取网址，我的意思是如果该网址位于某些网页中法语，它不会按原样提取它。我该如何解决这个问题？ import reque
c# - 提取 ZipFile
如何在 C# 中提取 ZipFile？(ZipFile 是包含文件和目录) 最佳答案为此使用工具。类似于 SharpZip .据我所知 - .NET 不支持开箱即用的 ZIP 文件。来自 here
c++ - 提取[]之间内容的正则表达式
我有一个表达: [training_width]:lofmimics 我要提取[]之间的内容，在上面的例子中我要 training_width 我试过以下方法: QRegularExpression
bash - 提取 "$@"中最后一个参数之前的参数
我正在尝试创建一个 Bash 脚本，该脚本将从命令行给出的最后一个参数提取到一个变量中以供其他地方使用。这是我正在处理的脚本: #!/bin/bash # compact - archive and
Javascript 提取 *.com
我正在寻找一个 JavaScript 函数/正则表达式来从 URI 中提取 *.com...(在客户端完成) 它应该适用于以下情况: siphone.com = siphone.com qwr.sip
python - BeautifulSoup 提取
关闭。这个问题需要更多focused .它目前不接受答案。想改进这个问题吗？更新问题，使其只关注一个问题 editing this post . 关闭 8 年前。 Improve this qu
Python JSON 提取
编辑:添加了实际的 JSON 对象和代码以供审查我有这种格式的 JSON(只是这种层次结构，假设 JSON 正常工作) {u'kind': u'calendar#events', u'default
python - 提取标签的内容
我已经编写了代码来使用 BeautifulSoup 提取一本书的 url 和标题来自页面。但它并没有在 > 之间提取惊人的 super 科学故事 1930 年 4 月这本书的名字。和标签。如何提
Java，提取$符号之间的单词
使用 Java，我想提取美元符号 $ 之间的单词。例如: String = " this is first attribute $color$. this is the second attribu
string - 提取.txt文件中以00开头的数字
您好，我正在尝试找到一种方法来确定字符串中的常量，然后提取该常量左侧的一定数量的字符。例如-我有一个 .txt 文件，在那个文件的某处有数字 00nnn 数字的例子是 00234 00765 ...
php操作（删除,提取,增加）zip文件方法详解
php读取zip文件(删除文件,提取文件,增加文件)实例从zip压缩文件中提取文件复制代码代码如下: <?php /* php 从zip压缩文件

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

perl - 从复杂的数据库提取 (SQL) 构建树状哈希 (YAML)？