gpt4 book ai didi

perl - 当键是动态的时在Perl中对哈希排序

转载 作者:行者123 更新时间:2023-12-05 01:16:28 25 4
gpt4 key购买 nike

我有一个哈希,如下所示:


my %data = (
'B2' => {
'one' => {
timestamp => '00:12:30'
},
'two' => {
timestamp => '00:09:30'
}
},
'C3' => {
'three' => {
timestamp => '00:13:45'
},
'adam' => {
timestamp => '00:09:30'
}
}
);


(实际上,结构要比这复杂;在这里我将其简化。)

我希望在时间戳上“全局”排序,然后在内部散列的键(一个,两个,三个adam)上进行排序。但是内部散列的键是动态的。从文件读取数据之前,我不知道它们将是什么。

我希望上述哈希的排序输出为:

00:09:30,C3,adam
00:09:30,B2,two
00:12:30,B2,one
00:13:45,C3,three


我看过许多有关按键和/或值对散列进行排序的问题/答案,但是当提前不知道键名时,我无法弄清楚。 (或者也许我只是不理解。)

我现在要做的是两个步骤。

将哈希值展平为数组:

my @flattened;
for my $outer_key (keys %data) {
for my $inner_key (keys %{$data{$outer_key}}) {
push @flattened, [
$data{$outer_key}{$inner_key}{timestamp}
, $outer_key
, $inner_key
];
}
}


然后进行排序:

for my $ary (sort { $a->[0] cmp $b->[0] || $a->[2] cmp $b->[2] } @flattened) {
print join ',' => @$ary;
print "\n";
}


我想知道是否有一种更简洁,优雅,有效的方法?

最佳答案

此类型的问题可能更适合Programmers Stack Exchange站点或Code Review一个站点。由于它在询问实施,因此我认为在这里询问是可以的。这些站点通常具有一些overlap



正如@DondiMichaelStroma指出的那样,并且您已经知道,您的代码很棒!但是,有多种方法可以做到这一点。对我来说,如果这是一个小脚本,我可能会保持原样,然后继续进行项目的下一部分。如果这是在更专业的代码库中,我将进行一些更改。

对我来说,在编写专业代码库时,我会尽量记住一些注意事项。


可读性
效率至关重要
不镀金
单元测试


因此,让我们看一下您的代码:

my %data = (
'B2' => {
'one' => {
timestamp => '00:12:30'
},
'two' => {
timestamp => '00:09:30'
}
},
'C3' => {
'three' => {
timestamp => '00:13:45'
},
'adam' => {
timestamp => '00:09:30'
}
}
);


定义数据的方式非常好,格式也很好。这可能不是在代码中构建 %data的方式,但是单元测试可能会有这样的哈希。

my @flattened;
for my $outer_key (keys %data) {
for my $inner_key (keys %{$data{$outer_key}}) {
push @flattened, [
$data{$outer_key}{$inner_key}{timestamp}
, $outer_key
, $inner_key
];
}
}
for my $ary (sort { $a->[0] cmp $b->[0] || $a->[2] cmp $b->[2] } @flattened) {
print join ',' => @$ary;
print "\n";
}


变量名可能更具描述性,并且 @flattened数组中包含一些冗余数据。用 Data::Dumper打印它,您可以看到我们在多个地方都有 C3B2

$VAR1 = [
'00:13:45',
'C3',
'three'
];
$VAR2 = [
'00:09:30',
'C3',
'adam'
];
$VAR3 = [
'00:12:30',
'B2',
'one'
];
$VAR4 = [
'00:09:30',
'B2',
'two'
];


也许这没什么大不了的,或者您想保留将所有数据保存在键 B2下的功能。

这是我们存储数据的另一种方式:

my %flattened = (
'B2' => [['one', '00:12:30'],
['two', '00:09:30']],
'C3' => [['three','00:13:45'],
['adam', '00:09:30']]
);


它可能使排序更加复杂,但使数据结构更简单!也许这已经接近镀金了,或者您可能会从代码另一部分的这种数据结构中受益。我的首选是保持数据结构简单,并在处理它们时根据需要添加额外的代码。如果您决定需要将 %flattened转储到日志文件,则可能不希望看到重复的数据。



实作

设计:我认为我们希望将其保留为两个不同的操作。这将有助于代码清晰,我们可以分别测试每个功能。第一个函数将在我们要使用的数据格式之间进行转换,第二个函数将对数据进行排序。这些功能应该在Perl模块中,我们可以使用 Test::More进行单元测试。我不知道我们从哪里调用这些函数,所以我们假装我们是从 main.pl调用它们的,我们可以将这些函数放在称为 Helper.pm的模块中。这些名称应更具描述性,但是我不确定这里的应用程序是什么!伟大的名字导致可读的代码。



main.pl

这就是 main.pl的样子。即使没有评论,描述性名称也可以使其自我记录。这些名称仍可以改进!

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use Utilities::Helper qw(sort_by_times_then_names convert_to_simple_format);

my %data = populate_data();

my @sorted_data = @{ sort_by_times_then_names( convert_to_simple_format( \%data ) ) };

print Dumper(@sorted_data);




实用程序/Helper.pm

这是可读和优雅的吗?我认为它可以使用一些改进。更具描述性的变量名也将在此模块中有所帮助。但是,它很容易测试,并且使我们的主代码保持整洁,数据结构简单。

package Utilities::Helper;
use strict;
use warnings;

use Exporter qw(import);
our @EXPORT_OK = qw(sort_by_times_then_names convert_to_simple_format);

# We could put a comment here explaning the expected input and output formats.
sub sort_by_times_then_names {

my ( $data_ref ) = @_;

# Here we can use the Schwartzian Transform to sort it
# Normally, we would just be sorting an array. But here we
# are converting the hash into an array and then sorting it.
# Maybe that should be broken up into two steps to make to more clear!
#my @sorted = map { $_ } we don't actually need this map
my @sorted = sort {
$a->[2] cmp $b->[2] # sort by timestamp
||
$a->[1] cmp $b->[1] # then sort by name
}
map { my $outer_key=$_; # convert $data_ref to an array of arrays
map { # first element is the outer_key
[$outer_key, @{$_}] # second element is the name
} # third element is the timestamp
@{$data_ref->{$_}}
}
keys %{$data_ref};
# If you want the elements in a different order in the array,
# you could modify the above code or change it when you print it.
return \@sorted;
}


# We could put a comment here explaining the expected input and output formats.
sub convert_to_simple_format {
my ( $data_ref ) = @_;

my %reformatted_data;

# $outer_key and $inner_key could be renamed to more accurately describe what the data they are representing.
# Are they names? IDs? Places? License plate numbers?
# Maybe we want to keep it generic so this function can handle different kinds of data.
# I still like the idea of using nested for loops for this logic, because it is clear and intuitive.
for my $outer_key ( keys %{$data_ref} ) {
for my $inner_key ( keys %{$data_ref->{$outer_key}} ) {
push @{$reformatted_data{$outer_key}},
[$inner_key, $data_ref->{$outer_key}{$inner_key}{timestamp}];
}
}

return \%reformatted_data;
}

1;




run_unit_tests.pl

最后,让我们实现一些单元测试。这可能比您在这个问题上寻找的要多,但是我认为进行测试的清晰接缝是优雅代码的一部分,我想证明这一点。 Test::More真的很棒。我什至将使用测试工具和格式化程序,以便我们获得一些优雅的输出。如果未安装 TAP::Formatter::Console,则可以使用 TAP::Formatter::JUnit

#!/usr/bin/env perl
use strict;
use warnings;
use TAP::Harness;

my $harness = TAP::Harness->new({
formatter_class => 'TAP::Formatter::JUnit',
merge => 1,
verbosity => 1,
normalize => 1,
color => 1,
timer => 1,
});

$harness->runtests('t/helper.t');




t / helper.t

#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
use Utilities::Helper qw(sort_by_times_then_names convert_to_simple_format);

my %data = (
'B2' => {
'one' => {
timestamp => '00:12:30'
},
'two' => {
timestamp => '00:09:30'
}
},
'C3' => {
'three' => {
timestamp => '00:13:45'
},
'adam' => {
timestamp => '00:09:30'
}
}
);

my %formatted_data = %{ convert_to_simple_format( \%data ) };

my %expected_formatted_data = (
'B2' => [['one', '00:12:30'],
['two', '00:09:30']],
'C3' => [['three','00:13:45'],
['adam', '00:09:30']]
);

is_deeply(\%formatted_data, \%expected_formatted_data, "convert_to_simple_format test");

my @sorted_data = @{ sort_by_times_then_names( \%formatted_data ) };

my @expected_sorted_data = ( ['C3','adam', '00:09:30'],
['B2','two', '00:09:30'],
['B2','one', '00:12:30'],
['C3','thee','00:13:45'] #intentionally typo to demonstrate output
);

is_deeply(\@sorted_data, \@expected_sorted_data, "sort_by_times_then_names test");

done_testing;




测试输出

以这种方式进行测试的好处是,如果测试失败,它将告诉您什么地方出了问题。

<testsuites>
<testsuite failures="1"
errors="1"
time="0.0478239059448242"
tests="2"
name="helper_t">
<testcase time="0.0452120304107666"
name="1 - convert_to_simple_format test"></testcase>
<testcase time="0.000266075134277344"
name="2 - sort_by_times_then_names test">
<failure type="TestFailed"
message="not ok 2 - sort_by_times_then_names test"><![CDATA[not o
k 2 - sort_by_times_then_names test

# Failed test 'sort_by_times_then_names test'
# at t/helper.t line 45.
# Structures begin differing at:
# $got->[3][1] = 'three'
# $expected->[3][1] = 'thee']]></failure>
</testcase>
<testcase time="0.00154280662536621" name="(teardown)" />
<system-out><![CDATA[ok 1 - convert_to_simple_format test
not ok 2 - sort_by_times_then_names test

# Failed test 'sort_by_times_then_names test'
# at t/helper.t line 45.
# Structures begin differing at:
# $got->[3][1] = 'three'
# $expected->[3][1] = 'thee'
1..2
]]></system-out>
<system-err><![CDATA[Dubious, test returned 1 (wstat 256, 0x100)
]]></system-err>
<error message="Dubious, test returned 1 (wstat 256, 0x100)" />
</testsuite>
</testsuites>


总而言之,与简洁相比,我更喜欢可读和清晰。有时,您可能会编写效率较低的代码,这些代码更容易编写,逻辑上也更简单。将丑陋的代码放在函数中是隐藏它的好方法!在运行时节省15ms的代码是不值得的。如果您的数据集足够大,以致于性能成为问题,那么Perl可能不是适合该工作的工具。如果您确实在寻找一些简洁的代码,请在 Code Golf Stack Exchange上发布挑战。

关于perl - 当键是动态的时在Perl中对哈希排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30677816/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com