gpt4 book ai didi

ruby - 使用 ruby​​ 的 treetop peg 解析 debian Packages.gz

转载 作者:太空宇宙 更新时间:2023-11-03 16:38:17 24 4
gpt4 key购买 nike

我正在尝试使用 Ruby 的树顶打开 Packages.gz,但我在使关键字和值明确无误时遇到了麻烦。这是我的树顶语法:

grammar Debian
rule collection
entry+
end
rule entry
(tag space value)
end

rule package_details
tag value &[^$]
end
rule tag
[A-Za-z0-9\-]+ ":"
end
rule value
(!tag value_line+ "\n")+
end
rule value_line
([A-Za-z0-9 <>@()=\.\-|/,_"':])+
end
rule space
[ \t]+
end
end

这是我的示例输入:

Package: acct
Priority: optional
Section: admin
Installed-Size: 352
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Mathieu Trudel <mathieu.tl@gmail.com>
Architecture: i386
Version: 6.5.4-2ubuntu1
Depends: dpkg (>= 1.15.4) | install-info, libc6 (>= 2.4)
Filename: pool/main/a/acct/acct_6.5.4-2ubuntu1_i386.deb
Size: 111226
MD5sum: 10cba1458ace8c31169c0e9e915c9a0f
SHA1: 6c2dcdc480144a9922329cd4fa22c7d1cb83fcbb
SHA256: bf8d8bb8eef3939786a1cefc39f94079f43464b71099f4a59b61b24cafdbc010
Description: The GNU Accounting utilities for process and login accounting
GNU Accounting Utilities is a set of utilities which reports and summarizes
data about user connect times and process execution statistics.
.
"Login accounting" provides summaries of system resource usage based on connect
time, and "process accounting" provides summaries based on the commands
executed on the system.
.
The 'last' command is provided by the sysvinit package and not included here.
Homepage: http://www.gnu.org/software/acct/
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Origin: Ubuntu
Supported: 18m

这几乎 100% 有效,但随后在检查 url 时失败。问题是 URL 包含一个“:”,我似乎无法阻止它。当我编辑示例的主页条目并将“_”替换为“:”时,它会直接通过。

这是我的第一个 PEG 语法,但我可以说我需要让它不那么含糊/更简洁。查看高级文档我想将标签定义为

rule tag
!(!'\n' .) [A-Za-z0-9\-]+ ":"
end

但是我完全不明白它在做什么。标签不能(没有换行符或任何东西)我的意思是,(换行符或什么都没有)。微妙之处让我无法理解......

切换到那种格式对我有帮助吗?有人知道为什么不匹配吗?

最佳答案

此时我似乎已经掌握了一种有效的语法:

grammar Debian
# The file is too big for us to emit a package_list. Look at parser.rb to see how I just split the string.
#rule package_list
# (package "\n"?)+ <DebianSyntaxNode::PackageList>
#end
rule package
(tag / value)+ <DebianSyntaxNode::Package>
end

rule tag
tag_value tag_stop <DebianSyntaxNode::Tag>
end
rule tag_value
[\w\-]+ <DebianSyntaxNode::TagValue>
end
rule tag_stop
": " <DebianSyntaxNode::TagStop>
end

rule value
value_line value_stop <DebianSyntaxNode::Value>
# value_line value_stop <DebianSyntaxNode::Value>
end
rule value_line
(!"\n" .)+ <DebianSyntaxNode::ValueLine>
# ([\w \. " , \- ' : / < > @ ( ) = | \[ \] + ; ~ í á * % `])+ <DebianSyntaxNode::ValueLine>
end
rule value_stop
"\n"? <DebianSyntaxNode::ValueStop>
end
end

问题是现在 value_line 在多行条目时不包含“\n”。另外,我必须在解析器中组合多行条目。

如果您想查看此代码的去向,请查看我启动的小 github 项目:https://github.com/derdewey/Debian-Packages-Parser

关于ruby - 使用 ruby​​ 的 treetop peg 解析 debian Packages.gz,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4130022/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com