regex - proto 代币候选排序-6ren

regex - proto 代币候选排序

转载作者：行者123 更新时间：2023-12-03 23:44:57

perl6 如何决定哪个 proto token先匹配？

下面的代码按预期工作，它匹配字符串 1234 , 和 Grammar::Tracer显示匹配的第一个标记是 s:sym<d> ，这是有道理的，因为它是最长的标记。

但是，如果我将文字更改为 token ，例如，更改 token three表格 '3'至 <digit> ，匹配失败，Grammar::Tracer显示 s:sym<b>正在匹配第一。

搬家 s:sym<d>到顶部，在两种情况下都匹配字符串，但是这种行为的解释是什么？

#!/usr/bin/env perl6
no precompilation;
use Grammar::Tracer;

grammar G {

  token TOP { <s> }

  proto token s { * }

  token s:sym<a> { <one> }
  token s:sym<b> { <one> <two> }
  token s:sym<c> { <one> <two> <three> }
  token s:sym<d> { <one> <two> <three> <four> }

  token one   { '1' }
  token two   { '2' }
  token three { '3' }
  token four  { '4' }
}

my $g = G.new;

say $g.parse: '1234';

# Output: Match
# token three { '3' }

TOP
|  s
|  |  s:sym<d>
|  |  |  one

# Output No Match
# token three { <digit> }

TOP
|  s
|  |  s:sym<b>
|  |  |  one

最佳答案

How does perl6 decide which proto token to match against first?

它使用 "Longest alternation" logic .在您的(介绍得很好!)案例中，相关的决定因素如下。

First, select the branch which has the longest declarative prefix.

所以首先要关注的是它不是“最长的标记”，而是最长的声明性前缀，模式的开始，只包含连续的“声明性”“原子”。

A 3是一个声明性原子。

A <foo>可能是也可能不是；这取决于它包含的内容。

我还没有找到明确的官方文档来确定哪些内置模式是声明性的，哪些不是，但看起来所有的模式都是用斜线声明的，例如 \d , 是声明性的，而所有以形式形式声明的都是 <foo> ，例如 <digit> ，不是。
(特别注意，内置的 <ws> 模式不是声明性的。鉴于 rules 中原子之后的空格被转换为 <ws> ，这意味着第一个这样的空格终止了该规则的声明性前缀。)

所以一个 <digit> atom 不是声明性前缀的一部分，而是终止前缀。

Moving s:sym<d> to the top, matches the string in both cases, but what is the explanation for that behavior?

因为随着 <three>的变化调用 <digit>您已将规则更改为三个与最长声明性前缀 ( <one> <two> ) 并列的规则。所以 other tie-breaking rules are used .

如果在这些打破平局的规则中选择获胜者的所有其他方法都失败了，则选择最后一个“最左边”的规则，即 ignoring inheritance , 表示在词法上排在第一位的规则。

关于regex - proto 代币候选排序，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56157835/

文章推荐： .net - F# 字符串连接运算符的用途是什么 ^

文章推荐： watchos - 使用 SwiftUI 的 subview 宽度相等

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

regex - proto 代币候选排序