c - 是否可以为规则设置优先级以避免 "longest-earliest"匹配模式？-6ren

c - 是否可以为规则设置优先级以避免 "longest-earliest"匹配模式？

转载作者：太空狗更新时间：2023-10-29 16:34:09

26

4

另一个简单的问题:有什么方法可以告诉 flex 优先选择匹配较短事物的规则而不是匹配较长事物的规则？我找不到关于此的任何好的文档。

这就是我需要它的原因:我为一种伪语言解析一个文件，其中包含一些与控制指令相对应的关键字。我希望它们具有绝对优先级，这样它们就不会被解析为表达式的一部分。我实际上需要这个优先事项，因为我不必为我的项目编写完整的语法(在我的情况下，这完全是矫枉过正，因为我对解析的程序进行结构分析，我不需要知道细节.. .)，所以我无法使用精细的语法调整来确保这些 block 不会被解析为表达式。

我们将不胜感激。

这是一个解析文件的例子:

If a > 0 Then read(b); Endif
c := "If I were...";
While d > 5 Do d := d + 1 Endwhile

我只想收集有关 Ifs、Thens、Endifs 等的信息……其余的对我来说无关紧要。这就是为什么我希望在不编写语法的情况下优先考虑 Ifs、Thens 等...相关规则。

最佳答案

摘自龙书第2版第3.5.3节“Lex中的冲突解决”:

We have alluded to the two rules that Lex uses to decide on the proper lexeme
to select, when several prefixes of the input match one or more patterns:
    1. Always prefer a longer prefix to a shorter prefix.
    2. If the longest possible prefix matches two or more patterns, prefer the
       pattern listed first in the Lex program.

上述规则也适用于 Flex。 Flex 手册是这么说的(第 7 章:如何匹配输入。)

When the generated scanner is run, it analyzes its input looking for strings 
which match any of its patterns. If it finds more than one match, it takes the 
one matching the most text (for trailing context rules, this includes the length 
of the trailing part, even though it will then be returned to the input). If it 
finds two or more matches of the same length, the rule listed first in the flex 
input file is chosen.

如果我没理解错的话，你的词法分析器会将 Endif 之类的关键字视为标识符，因此之后它将被视为表达式的一部分。如果这是你的问题，只需将关键字的规则放在你的规范之上，例如:(假设大写的每个单词都是预定义的枚举对应于一个标记)

"If"                      { return IF;         }
"Then"                    { return THEN;       }
"Endif"                   { return ENDIF;      }
"While"                   { return WHILE;      }
"Do"                      { return DO;         }
"EndWhile"                { return ENDWHILE;   }
\"(\\.|[^\\"])*\"         { return STRING;     }
[a-zA-Z_][a-zA-Z0-9_]*    { return IDENTIFIER; }

然后关键字将始终在标识符之前匹配由于第 2 条规则。

编辑:

谢谢你的评论，kol。我忘了添加字符串规则。 但我不认为我的解决方案是错误的。例如，如果一个名为 If_this_is_an_identifier 的标识符，规则 1 将适用，因此标识符规则将生效(因为它匹配最长的字符串)。我写了一个简单的测试用例，在我的解决方案中没有发现问题。这是我的 lex.l 文件:

%{
  #include <iostream>
  using namespace std;
%}

ID       [a-zA-Z_][a-zA-Z0-9_]*

%option noyywrap
%%

"If"                      { cout << "IF: " << yytext << endl;         }
"Then"                    { cout << "THEN: " << yytext << endl;       }
"Endif"                   { cout << "ENDIF: " << yytext << endl;      }
"While"                   { cout << "WHILE: " << yytext << endl;      }
"Do"                      { cout << "DO: " << yytext << endl;         }
"EndWhile"                { cout << "ENDWHILE: " << yytext << endl;   }
\"(\\.|[^\\"])*\"         { cout << "STRING: " << yytext << endl;     }
{ID}                      { cout << "IDENTIFIER: " << yytext << endl; }
.                         { cout << "Ignore token: " << yytext << endl; }

%%

int main(int argc, char* argv[]) {
  ++argv, --argc;  /* skip over program name */
  if ( argc > 0 )
    yyin = fopen( argv[0], "r" );
  else
    yyin = stdin;

  yylex();
}

我用以下测试用例测试了我的解决方案:

If If_this_is_an_identifier > 0 Then read(b); Endif
    c := "If I were...";
While While_this_is_also_an_identifier > 5 Do d := d + 1 Endwhile

它给了我以下输出(与您提到的问题无关的其他输出被忽略。)

IF: If
IDENTIFIER: If_this_is_an_identifier
......
STRING: "If I were..."
......
WHILE: While
IDENTIFIER: While_this_is_also_an_identifier

lex.l 程序是根据 flex manual 中的示例修改的:(使用相同的方法匹配标识符中的关键字)

另请查看 the ANSI C grammar, Lex specification .

我在个人项目中也使用了这种方式，目前没有发现任何问题。

关于c - 是否可以为规则设置优先级以避免 "longest-earliest"匹配模式？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8379299/

26

4

0

文章推荐： c - 我如何知道 ELF 对象文件中的调试信息类型？

文章推荐： html - 如何在 Angular 4 中使用 addHTML

文章推荐： html - 最高级别第一个标题的CSS选择器？

文章推荐：通过 typedefs 的 C11 匿名结构？

Python字典: How to get the longest key for the longest value?
dic = {'a':4, 'b':5, 'cd':5 } 我正在寻找: 最高值(首先搜索最高值 => b, cd) 最长的键(然后搜索最长的键 => 'cd') 我使用以下代码: max_val =
Python字典: How to get the longest key for the longest value?
dic = {'a':4, 'b':5, 'cd':5 } 我正在寻找: 最高值(首先搜索最高值 => b, cd) 最长的键(然后搜索最长的键 => 'cd') 我使用以下代码: max_val =
409. Longest Palindrome 最长回文串
题目地址：https://leetcode.com/problems/longest-palindrome/open in new window Difficulty: Easy 题目描
1405. Longest Happy String 最长快乐字符串
题目地址：https://leetcode-cn.com/problems/longest-happy-string/ 题目描述如果字符串中不含有任何 'aaa'，'bbb' 或 'ccc' 这
14. Longest Common Prefix 最长公共前缀
本文关键词：prefix, 公共前缀，题解，leetcode, 力扣，Python, C++, Java 题目地址：https://leetcode.com/problems/longest-com
5. Longest Palindromic Substring 最长回文子串
最长回文子串，题解，leetcode, 力扣，python, C++, java 题目地址：https://leetcode.com/problems/longest-palindromic-sub
300. Longest Increasing Subsequence 最长递增子序列
题目地址：https://leetcode.com/problems/longest-increasing-subsequence/description/ 题目描述 Given an unsor
516. Longest Palindromic Subsequence 最长回文子序列
题目地址：https://leetcode.com/problems/longest-palindromic-subsequence/description/ 题目描述 Given a strin
720. Longest Word in Dictionary 词典中最长的单词
题目地址：https://leetcode.com/problems/longest-word-in-dictionary/description/open in new window 题目描述
845. Longest Mountain in Array 数组中的最长山脉
题目地址：https://leetcode.com/problems/longest-mountain-in-array/description/ 题目描述 Let's call any (con
networking - 网络: Longest prefix matching
路由器 (IPv4) Destination Interface 0.0.0.0/0 m0 172.58.128.0/17 m1 1
powershell : find subdirectories with the longest paths
我想找到给定目录中子目录中最长的路径，因为我遇到了这个错误: The specified path, file name, or both are too long. The fully qualif
java - Java : Longest Ascending substring
我正在尝试创建一个Java程序，该程序读取键盘输入的数字字符串，并给出最长的升序子字符串。以下是我的代码: import java.util.Scanner; public class Ascen
regex - 这是否违反了 'leftmost longest' 原则？
我正在尝试编写一个正则表达式来识别单行文本，下划线 ( _ ) 被识别为行继续符。例如，“foo_\nbar”应被视为单行，因为“foo”以下划线结尾。我在尝试: $txt = "foo_\nbar"
c++ - 哪个编译器给出 longest long double
我可能在这里做了一些非常愚蠢的事情，但我已经达到了 double 可以达到的极限，并且在我的编译器上(我在 mac 上使用最新的 xcode)long double 似乎也好不到哪里去。我在别处读到
algorithm - 如何用DP解决 "Longest similar subsequence"
我已经阅读了 LCS 问题的解决方案。但是现在有一个最长相似子序列问题:序列 C 是两个序列 A、B 的相似子序列当且仅当 C 是 A 的子序列并且我们最多可以替换 C 中的 K 个元素使得 C 是
algorithm - 动态规划 : Longest Common Subsequence
我将复习在寻找两个等长字符串的最长公共(public)子序列的上下文中讨论动态规划的笔记。有问题的算法输出长度(不是子字符串)。所以我有两个字符串，比如说: S = ABAZDC，T = BACBA
Python:Longest Plateau Problem:找到最长连续等值序列的长度和位置
题目是解决 Sedgewick Wayne 的 Python 书中的以下问题: 给定一个整数数组，编写一个程序，找出最长的连续等值序列的长度和位置，其中该序列前后元素的值较小。我试过这个问题，遇到了
algorithm - 两个字符串所有可能的LCS(Longest Common Subsequence)
我们可以用DP(动态规划)找到两个字符串的LCS(最长公共(public)子序列)。通过跟踪 DP 表，我们可以获得 LCS。但是，如果存在不止一个濒海战斗舰，我们如何获得所有的濒海战斗舰呢？例子:
algorithm - 解释解决 'longest increasing subsequence'问题的算法
过去两个小时我一直试图理解这个算法，但似乎无法理解。有人可以用通俗易懂的方式解释一下吗？ function lis_length(a) n := a.length q := new A

首页

博学

6Ren·AI

商城

c - 是否可以为规则设置优先级以避免 "longest-earliest"匹配模式？