parsing - 如何使用解析器组合器处理 'line-continuation'-6ren

parsing - 如何使用解析器组合器处理 'line-continuation'

转载作者：行者123 更新时间：2023-12-02 15:40:00

25

4

我正在尝试使用 Sprache 编写一个小型解析器解析器组合器库。解析器应该能够将以单个 \ 结尾的行解析为无关紧要的空格。

问题

如何创建一个解析器来解析 = 符号后面可能包含行继续符 \ 的值？例如

a = b\e,\
    c,\
    d

应解析为 (KeyValuePair (Key, 'a'), (Value, 'b\e, c, d'))。

总的来说，我对使用这个库和解析器组合器还不熟悉。因此，任何指向正确方向的指针都将受到高度赞赏。

我尝试过的

测试

public class ConfigurationFileGrammerTest
{
    [Theory]
    [InlineData("x\\\n  y", @"x y")]
    public void ValueIsAnyStringMayContinuedAccrossLinesWithLineContinuation(
        string input, 
        string expectedKey)
    {
        var key = ConfigurationFileGrammer.Value.Parse(input);
        Assert.Equal(expectedKey, key);
    }
}

生产

尝试一

    public static readonly Parser<string> Value =
        from leading in Parse.WhiteSpace.Many()
        from rest in Parse.AnyChar.Except(Parse.Char('\\')).Many()
            .Or(Parse.String("\\\n")
            .Then(chs => Parse.Return(chs))).Or(Parse.AnyChar.Except(Parse.LineEnd).Many())
        select new string(rest.ToArray()).TrimEnd();

测试输出

Xunit.Sdk.EqualException: Assert.Equal() Failure
           ↓ (pos 1)
Expected: x y
Actual:   x\
           ↑ (pos 1)

尝试二

    public static readonly Parser<string> SingleLineValue =
        from leading in Parse.WhiteSpace.Many()
        from rest in Parse.AnyChar.Many().Where(chs => chs.Count() < 2 || !(string.Join(string.Empty, chs.Reverse().Take(2)).Equals("\\\n")))
        select new string(rest.ToArray()).TrimEnd();

    public static readonly Parser<string> ContinuedValueLines =
        from firsts in ContinuedValueLine.AtLeastOnce()
        from last in SingleLineValue
        select string.Join(" ", firsts) + " " + last;

    public static readonly Parser<string> Value = SingleLineValue.Once().XOr(ContinuedValueLines.Once()).Select(s => string.Join(" ", s));

测试输出

Xunit.Sdk.EqualException: Assert.Equal() Failure
           ↓ (pos 1)
Expected: x y
Actual:   x\\n  y
           ↑ (pos 1)

最佳答案

输出中不得包含续行符。这是上次单元测试的唯一问题。当您解析延续 \\\n 时，您必须将其从输出结果中删除并返回空字符串。抱歉，我不知道如何使用 C# 语言来做到这一点。也许有类似的东西:

Parse.String("\\\n").Then(chs => Parse.Return(''))

我使用combinatorix解决了这个问题python 库。它是一个解析器组合器库。 API 使用函数而不是使用链式方法，但想法是相同的。

这是带有注释的完整代码:

# `apply` return a parser that doesn't consume the input stream.  It
# applies a function (or lambda) to the output result of a parser.
# The following parser, will remove whitespace from the beginning
# and the end of what is parsed.
strip = apply(lambda x: x.strip())

# parse a single equal character
equal = char('=')

# parse the key part of a configuration line. Since the API is
# functional it reads "inside-out". Note, the use of the special
# `unless(predicate, parser)` parser. It is sometime missing from
# parser combinator libraries. What it does is use `parser` on the
# input stream if the `predicate` parser fails. It allows to execute
# under some conditions. It's similar in spirit to negation in prolog.
# It does parse *anything until an equal sign*, "joins" the characters
# into a string and strips any space starting or ending the string.
key = strip(join(one_or_more(unless(equal, anything))))

# parse a single carriage return character
eol = char('\n')

# returns a parser that return the empty string, this is a constant
# parser (aka. it always output the same thing).
return_empty_space = apply(lambda x: '')
# This will parse a full continuation (ie. including the space
# starting the new line.  It does parse *the continuation string then
# zero or more spaces* and return the empty string
continuation = return_empty_space(sequence(string('\\\n'), zero_or_more(char(' '))))

# `value` is the parser for the value part.  Unless the current char
# is a `eol` (aka. \n) it tries to parse a continuation, otherwise it
# parse anything. It does that at least once, ie. the value can not be
# empty. Then, it "joins" all the chars into a single string and
# "strip" from any space that start or end the value.
value = strip(join(one_or_more(unless(eol, either(continuation, anything)))))

# this basically, remove the element at index 1 and only keep the
# elements at 0 and 2 in the result. See below.
kv_apply = apply(lambda x: (x[0], x[2]))

# This is the final parser for a given kv pair. A kv pair is:
#
# - a key part (see key parser)
# - an equal part (see equal parser)
# - a value part (see value parser)
#
# Those are used to parse the input stream in sequence (one after the
# other). It will return three values: key, a '=' char and a value.
# `kv_apply` will only keep the key and value part.
kv = kv_apply(sequence(key, equal, value))


# This is sugar syntax, which turns the string into a stream of chars
# and execute `kv` parser on it.
parser = lambda string: combinatorix(string, kv)


input = 'a = b\\e,\\\n    c,\\\n    d'
assert parser(input) == ('a', 'b\\e,c,d')

关于parsing - 如何使用解析器组合器处理 'line-continuation'，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45105022/

25

4

0

文章推荐： asp.net - 将 Linq 转换为 SQL

文章推荐： posixct - "ct"和 "lt"(在 POSIXct 和 POSIXlt 中)是什么意思？

文章推荐： magento - 无法保存发票

文章推荐： git - 如何使用 Github Actions 查看最新提交？

Java 8 流 : How to read lines between two lines specified by line content
当前问题陈述的输入是 - 输入.txt #START_OF_TEST_CASES #DATA key1:VA1 key2:VA2 key3:VA3 key4:VA4 key5:VA5 #DEND #E
php - 注意 : . .. Unknown on line 0 - How to find correct line, it's NOT "line 0"
编辑:添加了 PDO 调用。这是实际的错误: Notice: Object of class PDOStatement could not be converted to int in Unknow
git - 有没有办法让 git show lines added, lines changed and lines removed？
“git diff --stat”和“git log --stat”显示如下输出: $ git diff -C --stat HEAD c9af3e6136e8aec1f79368c2a6164e56
java - 将 Files.lines 与 .map(line -> line.split ("multiple delimiters")) 一起使用
我有一个具有以下格式的输入文件:安大略省:布兰普顿:北纬 43° 41':西经 79° 45'安大略省:多伦多:北纬 43° 39':西经 79° 23'魁北克省:蒙特利尔:北纬 45° 30':西经
python - 为什么 line != "\n"或 line != "\r\n"或 line ! ="\r"无法过滤空行？
空白行仅包含\n或\r\n或\r。 tempfile = open(file,"r") for id,line in enumerate(tempfile): if(line != "\n"
lines - 如何去除 BABYLON Lines 上的光效
我尝试使用 BABYLON.js 开发棋盘游戏我有一个板子和一个 ArcRotateCamera。我的灯是 HemisphericLight 当我在板上画线时，我希望这些线具有相同的外观。现在，当我
lines - 如何去除 BABYLON Lines 上的光效
我尝试使用 BABYLON.js 开发棋盘游戏我有一个板子和一个 ArcRotateCamera。我的灯是 HemisphericLight 当我在板上画线时，我希望这些线具有相同的外观。现在，当我
linux - "$line"和 "^$line"有什么区别
有一个while read循环: while read line; do grep "^$line" file1 done < target 我应该使用 "^$line" 来获得正确答案。我想
python : How to fill an array line by line?
我有一个我无法解决的 numpy 问题。我有填充 0 和 1 的 3D 数组 (x,y,z)。例如，z 轴上的一个切片: array([[1, 0, 1, 0, 1, 1, 0, 0],
javascript - 如何迭代 "line-by-line"npm 中的所有行？
作为临时方法，我使用 .txt 文件来存储程序的某些变量。写入与 fs.appendFile 完美配合，但考虑到它的大小，使用 fs.readFile 读取不合适 - 我想得到某一行来自文件，以及
rstudio - R-调试: line by line through a loop
我试图找到一种通过R studio进行调试的方法，但是我发现的所有解决方案都无法真正起作用。 1.)CTRL + enter:有效，但不会通过循环的每次迭代，而只能执行一次。 2.)添加“browse
java - 安卓开发: Line Spacing With Line Numbering
在我的应用程序中，我的 EditText 左侧有行号 - 到目前为止一切都很好，行号与 EditText 的行完全对齐。问题是，如果用户更改 EditText 的文本大小，则行号无法正确对齐。所以我
command-line - Vim : from command line, 转到文件末尾并开始编辑？
通过使用 + 的参数调用它，我可以使 vim 将光标定位在文件的最后一行。 : vi + myfile # "+" = go to last line of file 我怎样才能做到
克洛尤尔 : Read an edn file line by line
我已经在文件中写入了这样的数据(某种) {:a 25 :b 28} {:a 2 :b 50} ... 我想要这些 map 的惰性序列。大约有 4000 万行。我也可以写 10000 的 block
javascript - 多行文本区域值 : line feed not present in all lines
我在文本区域中发现了一个奇怪的错误(？)... 比如说，有一个使用多行文本(用户粘贴的文本或预设文本无关紧要，两者都经过测试)。我想从中获取文本并替换 \n与其他东西......结果是，.re
python - Reportlab new line in a long line
我需要一个新行，这样我就可以在 PFD 中看到一个格式，我尝试添加一个页面宽度但它没有用，我用另一个东西/n 也没有用。这是我的代码。我可以手动添加格式，因为我需要显示从数据库中获取的信息，并且我在一
Java地理工具: Snap to line identifiying line that was snapped to
我正在尝试编写一个 Java 程序，它将大量 GPS 坐标捕捉到线形文件(道路网络)，并且不仅返回新坐标，还返回捕捉到的线段的唯一标识符。该标识符是否是 FID、其他语言中使用的“索引”(即，其中 1
javascript - 填充二维数组 "line by line"JavaScript/NodeJS
你好，我正在努力处理 JavaScript/NodeJS 中的数组。基本上，这是我的代码: let arr = new Array(); arr = { "Username" : var1,
python - matplotlib 2d line line,=plot逗号意思
我正在学习 matplotlib 的基本教程，我正在处理的示例代码是: import numpy as np import matplotlib.pylab as plt x=[1,2,3,4] y=
c# - 文本文件 : Reading line by line C#
所以，假设我有一个包含 20 行的文本文件，每行都有不同的文本。我希望能够有一个包含第一行的字符串，但是当我执行 NextLine(); 时我希望它成为下一行。我试过了，但它似乎不起作用: strin

首页

博学

6Ren·AI

商城

parsing - 如何使用解析器组合器处理 'line-continuation'

问题

我尝试过的

测试

生产