- r - 以节省内存的方式增长 data.frame
- ruby-on-rails - ruby/ruby on rails 内存泄漏检测
- android - 无法解析导入android.support.v7.app
- UNIX 域套接字与共享内存(映射文件)
我需要将正则表达式解析为它们在 PHP 中的组件。我创建或执行正则表达式没有问题,但我想显示有关正则表达式的信息(例如,列出捕获组,将重复字符附加到它们的目标,...)。整个项目是 WordPress 的一个插件,它提供有关重写规则的信息,这些规则是具有替换模式的正则表达式,并且可能难以理解。
我写了a simple implementation我自己,它似乎可以处理我抛给它的简单正则表达式并将它们转换为语法树。在扩展此示例以支持更多 op regex 语法之前,我想知道是否还有其他我可以查看的好的实现。实现语言并不重要。我假设大多数解析器都是为优化匹配速度而编写的,但这对我来说并不重要,甚至可能会妨碍清晰度。
最佳答案
我是 Debuggex 的创建者,其要求与您的要求非常相似:针对可显示的信息量进行优化。
下面是 Debuggex 使用的解析器中经过大量修改(为了便于阅读)的片段。它不能按原样工作,而是用于演示代码的组织。大多数错误处理已被删除。许多简单但冗长的逻辑也是如此。
请注意 recursive descent用来。这就是您在解析器中所做的,只是您的解析器被扁平化为单个函数。我大致使用了这个语法:
Regex -> Alt
Alt -> Cat ('|' Cat)*
Cat -> Empty | (Repeat)+
Repeat -> Base (('*' | '+' | '?' | CustomRepeatAmount) '?'?)
Base -> '(' Alt ')' | Charset | Literal
Charset -> '[' (Char | Range | EscapeSeq)* ']'
Literal -> Char | EscapeSeq
CustomRepeatAmount -> '{' Number (',' Number)? '}'
您会注意到我的很多代码只是处理正则表达式的 javascript 风格的特殊性。您可以在 this reference 上找到有关它们的更多信息。 .对于 PHP,this拥有您需要的所有信息。我认为您的解析器进展顺利;剩下的就是实现其余的运算符并正确处理边缘情况。
:) 享受:
var Parser = function(s) {
this.s = s; // This is the regex string.
this.k = 0; // This is the index of the character being parsed.
this.group = 1; // This is a counter for assigning to capturing groups.
};
// These are convenience methods to make reading and maintaining the code
// easier.
// Returns true if there is more string left, false otherwise.
Parser.prototype.more = function() {
return this.k < this.s.length;
};
// Returns the char at the current index.
Parser.prototype.peek = function() { // exercise
};
// Returns the char at the current index, then advances the index.
Parser.prototype.next = function() { // exercise
};
// Ensures c is the char at the current index, then advances the index.
Parser.prototype.eat = function(c) { // exercise
};
// We use a recursive descent parser.
// This returns the root node of our tree.
Parser.prototype.parseRe = function() {
// It has exactly one child.
return new ReTree(this.parseAlt());
// We expect that to be at the end of the string when we finish parsing.
// If not, something went wrong.
if (this.more()) {
throw new Error();
}
};
// This parses several subexpressions divided by |s, and returns a tree
// with the corresponding trees as children.
Parser.prototype.parseAlt = function() {
var alts = [this.parseCat()];
// Keep parsing as long as a we have more pipes.
while (this.more() && this.peek() === '|') {
this.next();
// Recursive descent happens here.
alts.push(this.parseCat());
}
// Here, we allow an AltTree with single children.
// Alternatively, we can return the child if there is only one.
return new AltTree(alts);
};
// This parses several concatenated repeat-subexpressions, and returns
// a tree with the corresponding trees as children.
Parser.prototype.parseCat = function() {
var cats = [];
// If we reach a pipe or close paren, we stop. This is because that
// means we are in a subexpression, and the subexpression is over.
while (this.more() && ')|'.indexOf(this.peek()) === -1) {
// Recursive descent happens here.
cats.push(this.parseRepeat());
}
// This is where we choose to handle the empty string case.
// It's easiest to handle it here because of the implicit concatenation
// operator in our grammar.
return (cats.length >= 1) ? new CatTree(cats) : new EmptyTree();
};
// This parses a single repeat-subexpression, and returns a tree
// with the child that is being repeated.
Parser.prototype.parseRepeat = function() {
// Recursive descent happens here.
var repeat = this.parseBase();
// If we reached the end after parsing the base expression, we just return
// it. Likewise if we don't have a repeat operator that follows.
if (!this.more() || '*?+{'.indexOf(this.peek()) === -1) {
return repeat;
}
// These are properties that vary with the different repeat operators.
// They aren't necessary for parsing, but are used to give meaning to
// what was parsed.
var min = 0; var max = Infinity; var greedy = true;
if (this.peek() === '*') { // exercise
} else if (this.peek() === '?') { // exercise
} else if (this.peek() === '+') {
// For +, we advance the index, and set the minimum to 1, because
// a + means we repeat the previous subexpression between 1 and infinity
// times.
this.next(); min = 1;
} else if (this.peek() === '{') { /* challenging exercise */ }
if (this.more() && this.peek() === '?') {
// By default (in Javascript at least), repetition is greedy. Appending
// a ? to a repeat operator makes it reluctant.
this.next(); greedy = false;
}
return new RepeatTree(repeat, {min:min, max:max, greedy:greedy});
};
// This parses a "base" subexpression. We defined this as being a
// literal, a character set, or a parnthesized subexpression.
Parser.prototype.parseBase = function() {
var c = this.peek();
// If any of these characters are spotted, something went wrong.
// The ) should have been eaten by a previous call to parseBase().
// The *, ?, or + should have been eaten by a previous call to parseRepeat().
if (c === ')' || '*?+'.indexOf(c) !== -1) {
throw new Error();
}
if (c === '(') {
// Parse a parenthesized subexpression. This is either a lookahead,
// a capturing group, or a non-capturing group.
this.next(); // Eat the (.
var ret = null;
if (this.peek() === '?') { // excercise
// Parse lookaheads and non-capturing groups.
} else {
// This is why the group counter exists. We use it to enumerate the
// group appropriately.
var group = this.group++;
// Recursive descent happens here. Note that this calls parseAlt(),
// which is what was initially called by parseRe(), creating
// a mutual recursion. This is where the name recursive descent
// comes from.
ret = new MatchTree(this.parseAlt(), group);
}
// This MUST be a ) or something went wrong.
this.eat(')');
return ret;
} else if (c === '[') {
this.next(); // Eat the [.
// Parse a charset. A CharsetTree has no children, but it does contain
// (pseudo)chars and ranges, and possibly a negation flag. These are
// collectively returned by parseCharset().
// This piece can be structured differently depending on your
// implementation of parseCharset()
var opts = this.parseCharset();
// This MUST be a ] or something went wrong.
this.eat(']');
return new CharsetTree(opts);
} else {
// Parse a literal. Like a CharsetTree, a LiteralTree doesn't have
// children. Instead, it contains a single (pseudo)char.
var literal = this.parseLiteral();
return new LiteralTree(literal);
}
};
// This parses the inside of a charset and returns all the information
// necessary to describe that charset. This includes the literals and
// ranges that are accepted, as well as whether the charset is negated.
Parser.prototype.parseCharset = function() {
// challenging exercise
};
// This parses a single (pseudo)char and returns it for use in a LiteralTree.
Parser.prototype.parseLiteral = function() {
var c = this.next();
if (c === '.' || c === '^' || c === '$') {
// These are special chars. Their meaning is different than their
// literal symbol, so we set the 'special' flag.
return new CharInfo(c, true);
} else if (c === '\\') {
// If we come across a \, we need to parse the escaped character.
// Since parsing escaped characters is similar between literals and
// charsets, we extracted it to a separate function. The reason we
// pass a flag is because \b has different meanings inside charsets
// vs outside them.
return this.parseEscaped({inCharset: false});
}
// If neither case above was hit, we just return the exact char.
return new CharInfo(c);
};
// This parses a single escaped (pseudo)char and returns it for use in
// either a LiteralTree or a CharsetTree.
Parser.prototype.parseEscaped = function(opts) {
// Here we instantiate some default options
opts = opts || {};
inCharset = opts.inCharset || false;
var c = peek();
// Here are a bunch of escape sequences that require reading further
// into the string. They are all fairly similar.
if (c === 'c') { // exercises
} else if (c === '0') {
} else if (isDigit(c)) {
} else if (c === 'x') {
} else if (c === 'u') {
// Use this as an example for implementing the ones above.
// A regex may be used for this portion, but I think this is clearer.
// We make sure that there are exactly four hexadecimal digits after
// the u. Modify this for the escape sequences that your regex flavor
// uses.
var r = '';
this.next();
for (var i = 0; i < 4; ++i) {
c = peek();
if (!isHexa(c)) {
throw new Error();
}
r += c;
this.next();
}
// Return a single CharInfo desite having read multiple characters.
// This is why I used "pseudo" previously.
return new CharInfo(String.fromCharCode(parseInt(r, 16)));
} else { // No special parsing required after the first escaped char.
this.next();
if (inCharset && c === 'b') {
// Within a charset, \b means backspace
return new CharInfo('\b');
} else if (!inCharset && (c === 'b' || c === 'B')) {
// Outside a charset, \b is a word boundary (and \B is the complement
// of that). We mark it one as special since the character is not
// to be taken literally.
return new CharInfo('\\' + c, true);
} else if (c === 'f') { // these are left as exercises
} else if (c === 'n') {
} else if (c === 'r') {
} else if (c === 't') {
} else if (c === 'v') {
} else if ('dDsSwW'.indexOf(c) !== -1) {
} else {
// If we got to here, the character after \ should be taken literally,
// so we don't mark it as special.
return new CharInfo(c);
}
}
};
// This represents the smallest meaningful character unit, or pseudochar.
// For example, an escaped sequence with multiple physical characters is
// exactly one character when used in CharInfo.
var CharInfo = function(c, special) {
this.c = c;
this.special = special || false;
};
// Calling this will return the parse tree for the regex string s.
var parse = function(s) { return (new Parser(s)).parseRe(); };
关于php - PHP 中正则表达式的解析器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4594135/
我在 JavaScript 文件中运行 PHP,例如...... var = '';). 我需要使用 JavaScript 来扫描字符串中的 PHP 定界符(打开和关闭 PHP 的 )。 我已经知道使
我希望能够做这样的事情: php --determine-oldest-supported-php-version test.php 并得到这个输出: 7.2 也就是说,php 二进制检查 test.
我正在开发一个目前不使用任何框架的大型 php 站点。我的大问题是,随着时间的推移慢慢尝试将框架融入应用程序是否可取,例如在创建的新部件和更新的旧部件中? 比如所有的页面都是直接通过url服务的,有几
下面是我的源代码,我想在同一页面顶部的另一个 php 脚本中使用位于底部 php 脚本的变量 $r1。我需要一个简单的解决方案来解决这个问题。我想在代码中存在的更新查询中使用该变量。 $name)
我正在制作一个网站,根据不同的情况进行大量 PHP 重定向。就像这样...... header("Location: somesite.com/redirectedpage.php"); 为了安全起见
我有一个旧网站,我的 php 标签从 因为短标签已经显示出安全问题,并且在未来的版本中将不被支持。 关于php - 如何避免在 php 文件中写入
我有一个用 PHP 编写的配置文件,如下所示, 所以我想用PHP开发一个接口(interface),它可以编辑文件值,如$WEBPATH , $ACCOUNTPATH和 const值(value)观
我试图制作一个登录页面来学习基本的PHP,首先我希望我的独立PHP文件存储HTML文件的输入(带有表单),但是当我按下按钮时(触发POST到PHP脚本) )我一直收到令人不愉快的错误。 我已经搜索了S
我正在寻找一种让 PHP 以一种形式打印任意数组的方法,我可以将该数组作为赋值包含在我的(测试)代码中。 print_r 产生例如: Array ( [0] => qsr-part:1285 [1]
这个问题已经有答案了: 已关闭11 年前。 Possible Duplicate: What is the max key size for an array in PHP? 正如标题所说,我想知道
我正在寻找一种让 PHP 以一种形式打印任意数组的方法,我可以将该数组作为赋值包含在我的(测试)代码中。 print_r 产生例如: Array ( [0] => qsr-part:1285 [1]
关闭。这个问题需要多问focused 。目前不接受答案。 想要改进此问题吗?更新问题,使其仅关注一个问题 editing this post . 已关闭 9 年前。 Improve this ques
我在 MySQL 数据库中有一个表,其中存储餐厅在每个工作日和时段提供的菜单。 表结构如下: i_type i_name i_cost i_day i_start i_
我有两页。 test1.php 和 test2.php。 我想做的就是在 test1.php 上点击提交,并将 test2.php 显示在 div 中。这实际上工作正常,但我需要向 test2.php
我得到了这个代码。我想通过textarea更新mysql。我在textarea中回显我的MySQL,但我不知道如何更新它,我应该把所有东西都放进去吗,因为_GET模式没有给我任何东西,我也尝试_GET
首先,我是 php 的新手,所以我仍在努力学习。我在 Wordpress 上创建了一个表单,我想将值插入一个表(data_test 表,我已经管理了),然后从 data_test 表中获取所有列(id
我有以下函数可以清理用户或网址的输入: function SanitizeString($var) { $var=stripslashes($var); $va
我有一个 html 页面,它使用 php 文件查询数据库,然后让用户登录,否则拒绝访问。我遇到的问题是它只是重定向到 php 文件的 url,并且从不对发生的事情提供反馈。这是我第一次使用 html、
我有一个页面充满了指向 pdf 的链接,我想跟踪哪些链接被单击。我以为我可以做如下的事情,但遇到了问题: query($sql); if($result){
我正在使用 从外部文本文件加载 HTML/PHP 代码 $f = fopen($filename, "r"); while ($line = fgets($f, 4096)) { print $l
我是一名优秀的程序员,十分优秀!