regex - 用于将 PCRE 正则表达式转换为 emacs 正则表达式的 Elisp 机制-6ren

regex - 用于将 PCRE 正则表达式转换为 emacs 正则表达式的 Elisp 机制

转载作者：行者123 更新时间：2023-12-03 10:40:46

我承认对喜欢有明显的偏见PCRE regexps 比 emacs 好得多，如果没有其他原因，当我输入 '(' 我几乎总是想要一个分组运算符。当然，\w 和类似的比其他等价物方便得多。

但是，当然，期望改变 emacs 的内部结构是很疯狂的。但是我认为应该可以从 PCRE experssion 转换为 emacs 表达式，并进行所有需要的转换，以便我可以写:

(defun my-super-regexp-function ...
   (search-forward (pcre-convert "__\\w: \d+")))

(或类似)。

有人知道可以做到这一点的elisp库吗？

编辑:从下面的答案中选择一个回复...

哇，我喜欢从 4 天的假期回来寻找大量有趣的答案来整理!我喜欢这两种类型的解决方案的工作。

最后，看起来解决方案的 exec-a-script 和直接 elisp 版本都可以工作，但是从纯粹的速度和“正确性”方法来看，elisp 版本肯定是人们更喜欢的版本(包括我自己) .

最佳答案

https://github.com/joddie/pcre2el是这个答案的最新版本。

pcre2el or rxt (RegeXp Translator or RegeXp Tools) is a utility for working with regular expressions in Emacs, based on a recursive-descent parser for regexp syntax. In addition to converting (a subset of) PCRE syntax into its Emacs equivalent, it can do the following:

convert Emacs syntax to PCRE

convert either syntax to rx, an S-expression based regexp syntax

untangle complex regexps by showing the parse tree in rx form and highlighting the corresponding chunks of code

show the complete list of strings (productions) matching a regexp, provided the list is finite

provide live font-locking of regexp syntax (so far only for Elisp buffers – other modes on the TODO list)

原始答案的文本如下......

这是一个 quick and ugly Emacs lisp solution (编辑:现在更永久地位于 here )。它主要基于 pcrepattern 中的描述手册页，并逐个标记地工作，仅转换以下结构:

括号分组( .. )

交替|

数字重复 {M,N}

字符串引用 \Q .. \E

简单的字符转义:\a , \c , \e , \f , \n , \r , \t , \x , 和 \ + 八进制数字

字符类:\d , \D , \h , \H , \s , \S , \v , \V

\w和 \W保持原样(使用 Emacs 自己的单词和非单词字符的想法)

它不会对更复杂的 PCRE 断言做任何事情，但它会尝试在字符类中转换转义符。在字符类包括类似 \D 的情况下，这是通过转换为具有交替的非捕获组来完成的。

它通过了我为它编写的测试，但肯定存在错误，并且逐个 token 扫描的方法可能很慢。换句话说，没有保修。但也许出于某些目的，它可以完成工作中更简单的部分。欢迎有兴趣的人士改进它;-)

(eval-when-compile (require 'cl))

(defvar pcre-horizontal-whitespace-chars
  (mapconcat 'char-to-string
             '(#x0009 #x0020 #x00A0 #x1680 #x180E #x2000 #x2001 #x2002 #x2003
                      #x2004 #x2005 #x2006 #x2007 #x2008 #x2009 #x200A #x202F
                      #x205F #x3000)
             ""))

(defvar pcre-vertical-whitespace-chars
  (mapconcat 'char-to-string
             '(#x000A #x000B #x000C #x000D #x0085 #x2028 #x2029) ""))

(defvar pcre-whitespace-chars
  (mapconcat 'char-to-string '(9 10 12 13 32) ""))

(defvar pcre-horizontal-whitespace
  (concat "[" pcre-horizontal-whitespace-chars "]"))

(defvar pcre-non-horizontal-whitespace
  (concat "[^" pcre-horizontal-whitespace-chars "]"))

(defvar pcre-vertical-whitespace
  (concat "[" pcre-vertical-whitespace-chars "]"))

(defvar pcre-non-vertical-whitespace
  (concat "[^" pcre-vertical-whitespace-chars "]"))

(defvar pcre-whitespace (concat "[" pcre-whitespace-chars "]"))

(defvar pcre-non-whitespace (concat "[^" pcre-whitespace-chars "]"))

(eval-when-compile
  (defmacro pcre-token-case (&rest cases)
    "Consume a token at point and evaluate corresponding forms.

CASES is a list of `cond'-like clauses, (REGEXP FORMS
...). Considering CASES in order, if the text at point matches
REGEXP then moves point over the matched string and returns the
value of FORMS. Returns `nil' if none of the CASES matches."
    (declare (debug (&rest (sexp &rest form))))
    `(cond
      ,@(mapcar
         (lambda (case)
           (let ((token (car case))
                 (action (cdr case)))
             `((looking-at ,token)
               (goto-char (match-end 0))
               ,@action)))
         cases)
      (t nil))))

(defun pcre-to-elisp (pcre)
  "Convert PCRE, a regexp in PCRE notation, into Elisp string form."
  (with-temp-buffer
    (insert pcre)
    (goto-char (point-min))
    (let ((capture-count 0) (accum '())
          (case-fold-search nil))
      (while (not (eobp))
        (let ((translated
               (or
                ;; Handle tokens that are treated the same in
                ;; character classes
                (pcre-re-or-class-token-to-elisp)   

                ;; Other tokens
                (pcre-token-case
                 ("|" "\\|")
                 ("(" (incf capture-count) "\\(")
                 (")" "\\)")
                 ("{" "\\{")
                 ("}" "\\}")

                 ;; Character class
                 ("\\[" (pcre-char-class-to-elisp))

                 ;; Backslash + digits => backreference or octal char?
                 ("\\\\\\([0-9]+\\)"
                  (let* ((digits (match-string 1))
                         (dec (string-to-number digits)))
                    ;; from "man pcrepattern": If the number is
                    ;; less than 10, or if there have been at
                    ;; least that many previous capturing left
                    ;; parentheses in the expression, the entire
                    ;; sequence is taken as a back reference.   
                    (cond ((< dec 10) (concat "\\" digits))
                          ((>= capture-count dec)
                           (error "backreference \\%s can't be used in Emacs regexps"
                                  digits))
                          (t
                           ;; from "man pcrepattern": if the
                           ;; decimal number is greater than 9 and
                           ;; there have not been that many
                           ;; capturing subpatterns, PCRE re-reads
                           ;; up to three octal digits following
                           ;; the backslash, and uses them to
                           ;; generate a data character. Any
                           ;; subsequent digits stand for
                           ;; themselves.
                           (goto-char (match-beginning 1))
                           (re-search-forward "[0-7]\\{0,3\\}")
                           (char-to-string (string-to-number (match-string 0) 8))))))

                 ;; Regexp quoting.
                 ("\\\\Q"
                  (let ((beginning (point)))
                    (search-forward "\\E")
                    (regexp-quote (buffer-substring beginning (match-beginning 0)))))

                 ;; Various character classes
                 ("\\\\d" "[0-9]")
                 ("\\\\D" "[^0-9]")
                 ("\\\\h" pcre-horizontal-whitespace)
                 ("\\\\H" pcre-non-horizontal-whitespace)
                 ("\\\\s" pcre-whitespace)
                 ("\\\\S" pcre-non-whitespace)
                 ("\\\\v" pcre-vertical-whitespace)
                 ("\\\\V" pcre-non-vertical-whitespace)

                 ;; Use Emacs' native notion of word characters
                 ("\\\\[Ww]" (match-string 0))

                 ;; Any other escaped character
                 ("\\\\\\(.\\)" (regexp-quote (match-string 1)))

                 ;; Any normal character
                 ("." (match-string 0))))))
          (push translated accum)))
      (apply 'concat (reverse accum)))))

(defun pcre-re-or-class-token-to-elisp ()
  "Consume the PCRE token at point and return its Elisp equivalent.

Handles only tokens which have the same meaning in character
classes as outside them."
  (pcre-token-case
   ("\\\\a" (char-to-string #x07))  ; bell
   ("\\\\c\\(.\\)"                  ; control character
    (char-to-string
     (- (string-to-char (upcase (match-string 1))) 64)))
   ("\\\\e" (char-to-string #x1b))  ; escape
   ("\\\\f" (char-to-string #x0c))  ; formfeed
   ("\\\\n" (char-to-string #x0a))  ; linefeed
   ("\\\\r" (char-to-string #x0d))  ; carriage return
   ("\\\\t" (char-to-string #x09))  ; tab
   ("\\\\x\\([A-Za-z0-9]\\{2\\}\\)"
    (char-to-string (string-to-number (match-string 1) 16)))
   ("\\\\x{\\([A-Za-z0-9]*\\)}"
    (char-to-string (string-to-number (match-string 1) 16)))))

(defun pcre-char-class-to-elisp ()
  "Consume the remaining PCRE character class at point and return its Elisp equivalent.

Point should be after the opening \"[\" when this is called, and
will be just after the closing \"]\" when it returns."
  (let ((accum '("["))
        (pcre-char-class-alternatives '())
        (negated nil))
    (when (looking-at "\\^")
      (setq negated t)
      (push "^" accum)
      (forward-char))
    (when (looking-at "\\]") (push "]" accum) (forward-char))

    (while (not (looking-at "\\]"))
      (let ((translated
             (or
              (pcre-re-or-class-token-to-elisp)
              (pcre-token-case              
               ;; Backslash + digits => always an octal char
               ("\\\\\\([0-7]\\{1,3\\}\\)"    
                (char-to-string (string-to-number (match-string 1) 8)))

               ;; Various character classes. To implement negative char classes,
               ;; we cons them onto the list `pcre-char-class-alternatives' and
               ;; transform the char class into a shy group with alternation
               ("\\\\d" "0-9")
               ("\\\\D" (push (if negated "[0-9]" "[^0-9]")
                              pcre-char-class-alternatives) "")
               ("\\\\h" pcre-horizontal-whitespace-chars)
               ("\\\\H" (push (if negated
                                  pcre-horizontal-whitespace
                                pcre-non-horizontal-whitespace)
                              pcre-char-class-alternatives) "")
               ("\\\\s" pcre-whitespace-chars)
               ("\\\\S" (push (if negated
                                  pcre-whitespace
                                pcre-non-whitespace)
                              pcre-char-class-alternatives) "")
               ("\\\\v" pcre-vertical-whitespace-chars)
               ("\\\\V" (push (if negated
                                  pcre-vertical-whitespace
                                pcre-non-vertical-whitespace)
                              pcre-char-class-alternatives) "")
               ("\\\\w" (push (if negated "\\W" "\\w") 
                              pcre-char-class-alternatives) "")
               ("\\\\W" (push (if negated "\\w" "\\W") 
                              pcre-char-class-alternatives) "")

               ;; Leave POSIX syntax unchanged
               ("\\[:[a-z]*:\\]" (match-string 0))

               ;; Ignore other escapes
               ("\\\\\\(.\\)" (match-string 0))

               ;; Copy everything else
               ("." (match-string 0))))))
        (push translated accum)))
    (push "]" accum)
    (forward-char)
    (let ((class
           (apply 'concat (reverse accum))))
      (when (or (equal class "[]")
                (equal class "[^]"))
        (setq class ""))
      (if (not pcre-char-class-alternatives)
          class
        (concat "\\(?:"
                class "\\|"
                (mapconcat 'identity
                           pcre-char-class-alternatives
                           "\\|")
                "\\)")))))

关于regex - 用于将 PCRE 正则表达式转换为 emacs 正则表达式的 Elisp 机制，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/9118183/

文章推荐： haskell - 为什么 GHC 认为这个类型变量不是单射的？

文章推荐： maven-2 - 如何显示Maven POM层次结构？

emacs - 有没有办法将 .emacs 更改应用到所有 emacs 客户端而无需重新启动 emacs 守护进程？
我通过“emacs --daemon”启动了 emacs 服务器。然后我开了几个客户端。我想将 .emacs 配置的更改应用于所有客户端，而无需重新启动 emacs 守护程序。这可能吗？最佳答案
emacs - 有没有办法将 .emacs 更改应用到所有 emacs 客户端而无需重新启动 emacs 守护进程？
我通过“emacs --daemon”启动了 emacs 服务器。然后我开了几个客户端。我想将 .emacs 配置的更改应用于所有客户端，而无需重新启动 emacs 守护程序。这可能吗？最佳答案
emacs - 可移植 Emacs？ (Emacs 服务器不工作)
我看到了一些关于使 emacs 便携(在 Windows 上)的建议。我的 site-start.el 中有这个: (defvar program-dir (substring data-direct
emacs - Emacs 新手，Emacs 似乎无法正确解释我的按键？
我是一名狂热的 Vim 用户。我的 Vimrc 有 800 多行。我是一个喜欢定制环境的每个部分的修补匠。 Emacs 似乎更容易配置。所以我尝试一下 Emacs。当您想要缩小时，请按 Emacs
emacs - Emacs 新手，Emacs 似乎无法正确解释我的按键？
我是一名狂热的 Vim 用户。我的 Vimrc 有 800 多行。我是一个喜欢定制环境的每个部分的修补匠。 Emacs 似乎更容易配置。所以我尝试一下 Emacs。当您想要缩小时，请按 Emacs
emacs - 在 emacs 中关闭 emacs
偶尔在term中使用emacs时模式我会误运行emacs file而不仅仅是打开文件。这将在当前客户端内创建一个嵌套的 emacs 客户端。我的问题是如何只关闭内部客户端？最佳答案回答您应该可以
emacs - 在 emacs 中运行 emacs
我一直在慢慢学习 elisp 和 emacs 的新命令，并且一直在稳步构建我的 .emacs。必须保持控制台打开以重复打开和关闭 emacs 实例似乎不是测试的最佳选择，但是从 emacs 中运行 e
emacs - Emacs 服务器可以编辑 Emacs 客户端指定的远程文件吗？
我正在寻找一个 emacs 服务器，以便 emacsclients 指定的文件是相对于 emacsclients 的文件系统而不是服务器的文件系统。例如，如果我设置一个机器“darkstar”上的
emacs - 在 emacs 23 和 emacs 24 之间共享 emacs 配置
我试图将我所有的 emacs 配置置于版本控制之下，以便在不同的计算机之间轻松切换。实际上我的首选系统是 OSX (10.8.3) 和来自 http://emacsformacosx.com/ 的 e
emacs - 什么是查找 emacs 函数名称和击键的好的 in-emacs 过程
我正在学习 emacs，我认为使用 emacs 的内置帮助功能开发设施将真正平滑学习击键的学习曲线。使用 emacs 的内置帮助功能来查找命令名称及其击键的有效过程是什么？例如，我忘记了关闭框架的
emacs - 与默认 emacs 分开改变 Emacs 迷你缓冲区的字体大小？
我一直在尝试将 emacs minibuffer 的字体/字体与 emacs 默认字体分开，但没有太多运气。具体来说，我有兴趣使 minibuffer 字体大小更大以用于 emacs MULE，因为
emacs - Emacs 和 Emacs Lisp 的故障排除技术
大约 4 年以来，我一直是一个相当普通的 emacs 用户，但在自定义 emacs 和排除 elisp 故障时，我仍然是新手。最近，我开始自定义 emacs 作为我的 ruby 开发环境，并且我从
emacs - 如何在 emacs 内部调用进程来启动 emacs，带参数
我希望 emacs 能够处理一些耗时的任务，而不阻塞输入。为此，我尝试了(其中插入的意思是用耗时的任务来代替) (call-process "emacs" nil 0 nil "--eval=(ins
emacs - 一些 emacs 桌面保存问题 : how to change it to save in ~/. emacs.d/.emacs.desktop
我的 init.el 中有这个设置 (desktop-save-mode 1) 这很好用，只是我想知道: 如何更改它以将 .emacs.desktop 文件保存到 ~/.emacs.d 而不是 ~/
emacs - 有没有办法检查 Emacs Lisp 函数何时添加到 Emacs？
我是 Emacs 包的作者，偶尔在处理我的包时，我会遇到一个看起来很有用的函数并在我的代码中使用它。然后，在我发布后不久，有人使用旧的 Emacs 版本(但仍然是我想要支持的版本)会报告该功能未定
emacs - 想在 EMACS 加载时查看 EMACS 消息缓冲区
我用 (message "..some text...") 在我的 init 文件中，在 EMACS 加载时将消息发送到消息缓冲区。这是我查看我刚刚所做的更改导致启动崩溃的快速方法。但是，我无法找到
emacs - Emacs 单引号字符串中的通用模式被突出显示
简单的问题，我在 Emacs 中使用通用模式进行颜色编码。除了在这种语言中 " 和 ' 可以用来表示字符串之外，下面的代码很好用，如 'this is a string' 或 “这是一个字符串”。默认
emacs - 如何使用预定义窗口启动 Emacs？
有没有办法让我的 Emacs 以预定义的框架作为我附加的屏幕截图开始？我不太熟悉如何在我的 .emacs 脚本中执行此操作... 就这么简单: split-window-horizontally(
emacs - Emacs 电对模式中的自定义对
在emacs markdown-mode写markdown时，我想让electric-pair-mode自动关闭**bold**和 *italic*成对语法，即当输入一个 * 一秒时 * 应该自动出现
emacs - Emacs 中的原始缩进
Emacs 是否有一个简单的原始缩进模式可以执行以下操作: 当我转到新行(按 Enter)时，复制上述行用于缩进的任何空格当我按 Tab 时，在我按 Tab 的地方插入可以配置的缩进字符(空格/制

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

regex - 用于将 PCRE 正则表达式转换为 emacs 正则表达式的 Elisp 机制