gpt4 book ai didi

regex - 如何将这个正则表达式转换为 Megaparsec 解析器而不造成困惑?

转载 作者:行者123 更新时间:2023-12-02 04:26:55 25 4
gpt4 key购买 nike

考虑这个正则表达式:

^foo/[^=]+/baz=(.*),[^,]*$

如果我在 foo/bar/baz=one,two 上运行它,它会匹配并且子组捕获 one。如果我在 foo/bar/baz/bar/baz=三,四,五 上运行它,它会匹配并且子组捕获三,四

我知道如何将其转换为 regex-applicative 解析器或 ReadP 解析器:

import Text.Regex.Applicative
match (string "foo/" *> some (psym (/= '=')) *> string "/baz=" *> many anySym <* sym ',' <* many (psym (/= ','))) <$> ["foo/bar/baz=one,two", "foo/bar/baz/bar/baz=three,four,five"]
-- [Just "one",Just "three,four"]
import Text.ParserCombinators.ReadP
readP_to_S (string "foo/" *> many1 (satisfy (/= '=')) *> string "/baz=" *> many get <* char ',' <* many (satisfy (/= ',')) <* eof) <$> ["foo/bar/baz=one,two", "foo/bar/baz/bar/baz=three,four,five"]
-- [[("one","")],[("three,four","")]]

这两者都按照我想要的方式工作。但是当我尝试将其直接音译为 Megaparsec 时,情况很糟糕:

import Text.Megaparsec
parse (chunk "foo/" *> some (anySingleBut '=') *> chunk "/baz=" *> many anySingle <* single ',' <* many (anySingleBut ',') <* eof) "" <$> ["foo/bar/baz=one,two", "foo/bar/baz/bar/baz=three,four,five"]
-- [Left (ParseErrorBundle {bundleErrors = TrivialError 11 (Just (Tokens ('=' :| "one,"))) (fromList [Tokens ('/' :| "baz=")]) :| [], bundlePosState = PosState {pstateInput = "foo/bar/baz=one,two", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}}),Left (ParseErrorBundle {bundleErrors = TrivialError 19 (Just (Tokens ('=' :| "thre"))) (fromList [Tokens ('/' :| "baz=")]) :| [], bundlePosState = PosState {pstateInput = "foo/bar/baz/bar/baz=three,four,five", pstateOffset = 0, pstateSourcePos = SourcePos {sourceName = "", sourceLine = Pos 1, sourceColumn = Pos 1}, pstateTabWidth = Pos 8, pstateLinePrefix = ""}})]

我知道这源于兆秒差距默认情况下不回溯。我试图通过在许多不同的地方粘贴 try 来解决这个问题,但我无法让它工作。最终,我用 notFollowedBy 让这个怪物开始工作:

import Text.Megaparsec
parse (chunk "foo/" *> some (noneOf "=/" <|> try (single '/' <* notFollowedBy (chunk "baz="))) *> chunk "/baz=" *> many (try (anySingle <* notFollowedBy (many (anySingleBut ',') <* eof))) <* single ',' <* many (anySingleBut ',') <* eof) "" <$> ["foo/bar/baz=one,two", "foo/bar/baz/bar/baz=three,four,five"]
-- [Right "one",Right "three,four"]

但这看起来一团糟!特别是,我不喜欢我实际上必须两次指定大部分模式。从技术上讲,这不是相当于正则表达式 ^foo/(?:[^=/]|/(?!baz=))+/baz=((?:.(?![^, ]*$))*),[^,]*$,而不是我最初的正则表达式?必须有更好的方法来编写该解析器。我该怎么做?

<小时/>

编辑:我也尝试过这种方式,也有效(不,它错误地接受foo//baz=,):

import Text.Megaparsec
parse (chunk "foo/" *> (some . try $ many (noneOf "=/") *> single '/') *> chunk "baz=" *> ((++) <$> many (anySingleBut ',') <*> (concat <$> manyTill ((:) <$> single ',' <*> many (anySingleBut ',')) (try $ single ',' *> many (anySingleBut ',') *> eof)))) "" <$> ["foo/bar/baz=one,two", "foo/bar/baz/bar/baz=three,four,five"]
-- [Right "one",Right "three,four"]

不过,它看起来同样困惑,并且 manyTill 意味着它不再真正映射到任何正则表达式。

最佳答案

如果不仔细阅读,我猜给您带来麻烦的是这部分:

(.*),[^,]*

如果是这样,请考虑使用

sepBy (many (noneOf ",")) (string ",")

它将解析逗号分隔的列表。然后在纯代码中在该列表中除最后一个元素之外的所有元素之间重新插入逗号(例如,使用放置良好的 fmap)。

从评论来看,您似乎在这部分也遇到了一些问题:

/[^=]+/baz=

你可以考虑这样的翻译:

slashPath = string "/" <++> path
path = string "baz=" <|> (many (noneOf "=/") <++> slashPath)
(<++>) = liftA2 (++)

关于regex - 如何将这个正则表达式转换为 Megaparsec 解析器而不造成困惑?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59654051/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com