gpt4 book ai didi

algorithm - 复杂排序选择数据结构的建议

转载 作者:塔克拉玛干 更新时间:2023-11-03 04:48:38 25 4
gpt4 key购买 nike

我正在寻求一些帮助来编写一些 Perl 代码来对日志文件进行排序。

总的来说,我是编码和 perl 的新手!

我需要尽可能只使用核心 perl 模块来编写我的代码,但如果这被证明是不可能的,我对 CPAN 模块持开放态度。日志文件包含记录的消息列表,需要按顺序重新排列。应该很简单,但是有很多陷阱,这让我在设计数据结构时遇到了麻烦。输入文件格式为 CSV,输出需要与按时间戳顺序排列的消息相同,并首先将第一条消息部分组合在一起。

陷阱

  1. 消息需要按时间戳排序。
  2. 如果消息被拆分成多行,那么在最后的字段“(消息引用 1 的 3 部分中的第 1 部分)”中将包含类似以下内容。对于特定的消息引用,所有部分都需要按顺序排列,因此第 1 部分,然后是第 2 部分,然后是第 3 部分,依此类推。
  3. 此字段开头的十六进制数字告诉我它是 8 位还是 16 位引用,具有相同引用编号的 8 位引用与具有相同编号的 16 位引用不匹配(作为副本) .所以我需要考虑到这一点。
  4. 消息部分可能会丢失,因此我们可能只得到第 1 部分和第 2 部分,共 3 部分。
  5. 可能存在重复的消息引用编号,因此每个消息引用都需要绑定(bind)到发件人字段以赋予其唯一标识。
  6. 即使使用 (3) 中的唯一标识,随着时间的推移仍然可能出现重复(因为在它们重置之前只有这么多消息引用号),所以我需要检查收到的最后一部分的时间与重复消息引用。如果消息部分之间的间隔超过 3 天,那么我可以将其视为一条新消息。
  7. 最后,日志文件中可能有数十万行需要重新排序,因此将其全部加载到内存中可能不是一种选择。

如果我只放置一些示例输入数据,然后它需要如何输出,这可能是最好的。

输入数据

#message uniqueID,From,To,Time,flag,content,IP,concatenation info   
1,"+1231231234","+15125562100","7 Sep 2012 22:08:33","","abcdefghijklmnopqrstuvwxyz",,
2,"+1231231234","+15125562100","7 Sep 2012 22:08:37","","abcdefghijklmnopqrstuvwxyz",,
3,"+1231231234","+15125562100","7 Sep 2012 22:08:41","","abcdefghijklmnopqrstuvwxyz",,
4,"+8888888888","+15125562100","7 Sep 2012 22:09:01","","SHORTUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wi",,"BQADAQMB (part 1 of 3 of message reference 1)"
5,"+8888888888","+15125562100","7 Sep 2012 22:09:04","","h my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall ",,"BQADAQMC (part 2 of 3 of message reference 1)"
6,"+8888888888","+15125562100","7 Sep 2012 22:09:05","","ress, ah, nevermore!",,"BQADAQMD (part 3 of 3 of message reference 1)"
7,"+8888888888","+15125562100","7 Sep 2012 22:09:06","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIDAQ== (part 1 of 3 of message reference 2)"
8,"+8888888888","+15125562100","7 Sep 2012 22:09:07",""," my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall p",,"BggEAAIDAg== (part 2 of 3 of message reference 2)"
10,"+1231231234","+15125562100","7 Sep 2012 22:09:46","","abcdefghijklmnopqrstuvwxyz",,
11,"+1231231234","+15125562100","7 Sep 2012 22:09:50","","abcdefghijklmnopqrstuvwxyz",,
12,"+1231231234","+15125562100","7 Sep 2012 22:09:55","","abcdefghijklmnopqrstuvwxyz",,
13,"+8888888888","+15125562100","13 Sep 2012 22:10:36","","SHORTUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wi",,"BQADAQMB (part 1 of 3 of message reference 1)"
14,"+8888888888","+15125562100","13 Sep 2012 22:10:38","","h my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall ",,"BQADAQMC (part 2 of 3 of message reference 1)"
15,"+8888888888","+15125562100","13 Sep 2012 22:10:39","","ress, ah, nevermore!",,"BQADAQMD (part 3 of 3 of message reference 1)"
16,"+8888888889","+15125562100","7 Sep 2012 22:09:06","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIDAQ== (part 1 of 3 of message reference 2)"
17,"+8888888889","+15125562100","7 Sep 2012 22:10:42",""," my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall p",,"BggEAAIDAg== (part 2 of 3 of message reference 2)"
18,"+8888888889","+15125562100","7 Sep 2012 22:10:43","","ess, ah, nevermore!",,"BggEAAIDAw== (part 3 of 3 of message reference 2)"
19,"+1231231234","+15125562100","13 Sep 2012 20:12:52","","Deposit SMS with readreceiptrequest = false #0",,
20,"+1231231234","+15125562100","13 Sep 2012 20:12:53","","Deposit SMS with readreceiptrequest = false #1",,
21,"+1231231234","+15125562100","13 Sep 2012 20:12:54","","Deposit SMS with readreceiptrequest = false #2",,
22,"+8888888888","+15125562100","13 Sep 2012 20:12:55","","Deposit SMS with readreceiptrequest = false #0: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms ",,"BQADAAMB (part 1 of 3 of message reference 0)"
23,"+8888888888","+15125562100","13 Sep 2012 20:12:57","","ore; This and more I sat divining, with my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with",,"BQADAAMC (part 2 of 3 of message reference 0)"
24,"+8888888888","+15125562100","13 Sep 2012 20:12:58","","the lamplight gloating oer She shall press, ah, nevermore!",,"BQADAAMD (part 3 of 3 of message reference 0)"
25,"+8888888888","+15125562100","7 Sep 2012 22:10:40","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIEAQ== (part 1 of 2 of message reference 3)"
26,"+8888888888","+15125562100","7 Sep 2012 22:10:42","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIEAQ== (part 1 of 2 of message reference 3)"
27,"+8888888888","+15125562100","7 Sep 2012 22:10:43","","ess, ah, nevermore!",,"BggEAAIEAw== (part 2 of 2 of message reference 3)"
28,"+8888888888","+15125562100","13 Sep 2012 20:13:02","","Deposit SMS with readreceiptrequest = false #2: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms ",,"BQADAgMB (part 1 of 3 of message reference 2)"
29,"+8888888888","+15125562100","13 Sep 2012 20:13:03","","ore; This and more I sat divining, with my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with",,"BQADAgMC (part 2 of 3 of message reference 2)"
30,"+8888888888","+15125562100","13 Sep 2012 20:13:04","","the lamplight gloating oer She shall press, ah, nevermore!",,"BQADAgMD (part 3 of 3 of message reference 2)"
31,"+1231231234","+15125562100","13 Sep 2012 20:13:08","","Deposit SMS with readreceiptrequest = true #0",

输出数据

#message uniqueID,From,To,Time,flag,content,IP,concatenation info   
1,"+1231231234","+15125562100","7 Sep 2012 22:08:33","","abcdefghijklmnopqrstuvwxyz",,
2,"+1231231234","+15125562100","7 Sep 2012 22:08:37","","abcdefghijklmnopqrstuvwxyz",,
3,"+1231231234","+15125562100","7 Sep 2012 22:08:41","","abcdefghijklmnopqrstuvwxyz",,
4,"+8888888888","+15125562100","7 Sep 2012 22:09:01","","SHORTUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wi",,"BQADAQMB (part 1 of 3 of message reference 1)"
5,"+8888888888","+15125562100","7 Sep 2012 22:09:04","","h my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall ",,"BQADAQMC (part 2 of 3 of message reference 1)"
6,"+8888888888","+15125562100","7 Sep 2012 22:09:05","","ress, ah, nevermore!",,"BQADAQMD (part 3 of 3 of message reference 1)"
16,"+8888888889","+15125562100","7 Sep 2012 22:09:06","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIDAQ== (part 1 of 3 of message reference 2)"
17,"+8888888889","+15125562100","7 Sep 2012 22:10:42",""," my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall p",,"BggEAAIDAg== (part 2 of 3 of message reference 2)"
18,"+8888888889","+15125562100","7 Sep 2012 22:10:43","","ess, ah, nevermore!",,"BggEAAIDAw== (part 3 of 3 of message reference 2)"
7,"+8888888888","+15125562100","7 Sep 2012 22:09:06","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIDAQ== (part 1 of 3 of message reference 2)"
8,"+8888888888","+15125562100","7 Sep 2012 22:09:07",""," my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall p",,"BggEAAIDAg== (part 2 of 3 of message reference 2)"
10,"+1231231234","+15125562100","7 Sep 2012 22:09:46","","abcdefghijklmnopqrstuvwxyz",,
11,"+1231231234","+15125562100","7 Sep 2012 22:09:50","","abcdefghijklmnopqrstuvwxyz",,
12,"+1231231234","+15125562100","7 Sep 2012 22:09:55","","abcdefghijklmnopqrstuvwxyz",,
25,"+8888888888","+15125562100","7 Sep 2012 22:10:40","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIEAQ== (part 1 of 2 of message reference 3)"
26,"+8888888888","+15125562100","7 Sep 2012 22:10:42","","LONGUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wit",,"BggEAAIEAQ== (part 1 of 2 of message reference 3)"
27,"+8888888888","+15125562100","7 Sep 2012 22:10:43","","ess, ah, nevermore!",,"BggEAAIEAw== (part 2 of 2 of message reference 3)"
19,"+1231231234","+15125562100","13 Sep 2012 20:12:52","","Deposit SMS with readreceiptrequest = false #0",,
20,"+1231231234","+15125562100","13 Sep 2012 20:12:53","","Deposit SMS with readreceiptrequest = false #1",,
21,"+1231231234","+15125562100","13 Sep 2012 20:12:54","","Deposit SMS with readreceiptrequest = false #2",,
22,"+8888888888","+15125562100","13 Sep 2012 20:12:55","","Deposit SMS with readreceiptrequest = false #0: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms ",,"BQADAAMB (part 1 of 3 of message reference 0)"
23,"+8888888888","+15125562100","13 Sep 2012 20:12:57","","ore; This and more I sat divining, with my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with",,"BQADAAMC (part 2 of 3 of message reference 0)"
24,"+8888888888","+15125562100","13 Sep 2012 20:12:58","","the lamplight gloating oer She shall press, ah, nevermore!",,"BQADAAMD (part 3 of 3 of message reference 0)"
28,"+8888888888","+15125562100","13 Sep 2012 20:13:02","","Deposit SMS with readreceiptrequest = false #2: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms ",,"BQADAgMB (part 1 of 3 of message reference 2)"
29,"+8888888888","+15125562100","13 Sep 2012 20:13:03","","ore; This and more I sat divining, with my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with",,"BQADAgMC (part 2 of 3 of message reference 2)"
30,"+8888888888","+15125562100","13 Sep 2012 20:13:04","","the lamplight gloating oer She shall press, ah, nevermore!",,"BQADAgMD (part 3 of 3 of message reference 2)"
31,"+1231231234","+15125562100","13 Sep 2012 20:13:08","","Deposit SMS with readreceiptrequest = true #0",
13,"+8888888888","+15125562100","13 Sep 2012 22:10:36","","SHORTUDH: Thus I sat engaged in guessing, but no syllable expressing To the fowl, whose fiery eyes now burned into my bosoms core; This and more I sat divining, wi",,"BQADAQMB (part 1 of 3 of message reference 1)"
14,"+8888888888","+15125562100","13 Sep 2012 22:10:38","","h my head at ease reclining On the cushions velvet lining that the lamplight gloated oer, But whose velvet violet lining with the lamplight gloating oer She shall ",,"BQADAQMC (part 2 of 3 of message reference 1)"
15,"+8888888888","+15125562100","13 Sep 2012 22:10:39","","ress, ah, nevermore!",,"BQADAQMD (part 3 of 3 of message reference 1)"

到目前为止我所做的事情是

  1. 将时间字段转换为纪元时间,以便于进行任何比较
  2. 可以读入(和写出文件)。
  3. 可以解析所有 CSV 列。
  4. 可以将串联信息拆分成多个部分,即 8 位或 16 位引用、部分编号、总数和引用 ID。

现在我一直在寻找有效过滤和排序数据的最佳方法。我试过使用哈希并首先将文件加载到内存中,这样我就可以对特定的消息引用进行排序,但我不确定这是否适用于大文件。

然后我考虑逐行阅读它,但我可能会遇到一个问题,即第二行包含串联短信的第一部分,我们可能要到文件末尾才能获得后续部分,所以我在想也许这也不是一个好主意。

我也想到了一个数据库,但我认为在需要运行的系统上设置它太复杂了。另一种选择可能是编写一个包并将复杂结构存储为一个对象?也许我把事情复杂化了?我的大脑肯定要糊涂了!

无论如何,我们将不胜感激任何想法或指导。

希望上面的内容是清楚的,但如果您有任何问题,请问我。

谢谢,将

最佳答案

如果分解得当,我认为这个问题不会太复杂。

在我看来,您的排序程序将包含以下阶段:

  1. 从每一行中提取相关信息(时间戳和连接信息)。
  2. 按消息引用对行进行分组,这可以通过缓存以高效的内存方式完成。
  3. 按时间戳对组进行排序。
  4. 将组展平为原始行。

施瓦兹变换

Schwartzian是在 Perl 中排序时的常见模式。它加速了排序,其中排序索引必须从要实际排序的数据中提取,通过一次提取此数据,而不是在每次比较时。也可以描述为装饰-排序-取消装饰。

示例:按长度对字符串进行排序。请注意,在这种情况下,朴素的实现实际上会更好。

my @words = qw( aaa b cccc );
my @sorted_words =
map { $_->[1] } # flatten
sort { $a->[0] <=> $b->[0] } # sort by first field (length)
map { [ length $_, $_ ] } # decorate: return arrayref with key and data
@words;
print "[@sorted_words]\n"; # prints "[b aaa cccc]"

在你的任务中记住这个模式会很好

1。提取

您已经做到了。对于每一行,我们输出一个包含以下字段的数组引用或类似内容:

0: timestamp (in epoch)
1: part no \
2: total parts | these are undef if no concat info is present
3: message reference /
4: The unmodifed line

对于 CSV 提取,您应该使用 Text::CSV , 要计算纪元,您应该查看 DateTime

2。分组

我们以散列形式定义缓存,将消息引用作为键,将组作为值。一个组是一个 arrayref 作为上面指定的提取格式,但可能包含位置 5 及以后的更多行(即每个标记行都是一个组)。

对于接收到的每个标记行,我们执行以下过程:

# pseudocode
# this is how I understood your requirements,
# but it may be wrong. The general principle still holds
# (you may need to choose a different key)
IF the line doesn't have part information, THEN
pass it on immediately.
ELSE
IF the hash has an entry for our message reference, THEN
IF the timestamp of the present group is too old, THEN
pass on the existing group.
Add our line for this key.
ELSE
Update the group with our line,
adding the original line (at position 3 + part no),
but not the metadata to the group.
IF the group is made complete, THEN
pass it on immediately,
delete this entry from the hash.
ELSE
Add the line as a group.
Make sure the content is at position 3 + part no, to allow easy updating.

在没有新行出现后,我们将散列中的每个剩余值传递到下一阶段。

要意识到的重要一点是,您不必在此处保留所有行,而只需保留不完整的组。

有趣的 Perl 函数是 exists $hash{element}delete $hash{element}delete 可能对节省内存很重要。

3。排序

我们简单地按时间戳对每个元素进行排序。如果总数据太多,系统无法处理,我们可以使用一个技巧:

  1. 对较小的数据 block 进行排序,将它们保存到文件中。
  2. 打开每个文件。
  3. 从每个文件中加载第一项
  4. 做 - 当至少有一个文件还剩项目时:
    1. 对所有加载的项目进行排序
    2. 传递第一个结果元素。
    3. 从当前第一个元素来自的文件中加载下一个项目
  5. 按正确顺序传递其他(已加载)项目

但是,这是非常耗时的。

4。扁平化

在这里,我们只接收排序和分组的项目。我们所要做的就是以正确的顺序输出包含的行。

关于algorithm - 复杂排序选择数据结构的建议,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15126863/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com