gpt4 book ai didi

haskell - 在单个输入上从多个正确的解析器中进行选择

转载 作者:行者123 更新时间:2023-12-02 13:32:39 25 4
gpt4 key购买 nike

我想知道在多个解析器可以成功的情况下解析输入的最佳方法。我已经概述了我的第一次失败的尝试和一个不优雅的解决方案,我希望它可以变得更加惯用。

例如,我想将以下句子中的“the”、“quick”和“fox”词法到它们自己的数据构造函数中:

"the quick brown fox jumps over the lazy dog".

因此给出以下类型构造函数:

data InterestingWord = Quick | The | Fox deriving Show
data Snippet = Word InterestingWord | Rest String deriving Show

我希望解析的输出是:

[Word The,
Rest " ", Word Quick,
Rest " brown ", Word Fox,
Rest " jumped over ", Word The,
Rest " lazy dog"]

以下是两种解决方案:

import Text.Parsec
import Data.Maybe
import Data.Ord
import Data.List

data InterestingWord = Quick | The | Fox deriving Show
data Snippet = Word InterestingWord | Rest String deriving Show

testCase = "the quick brown fox jumped over the lazy dog"
-- Expected output:
-- [Word The,
-- Rest " ", Word Quick,
-- Rest " brown ", Word Fox,
-- Rest " jumped over ", Word The,
-- Rest " lazy dog"]

toString Quick = "quick"
toString The = "the"
toString Fox = "fox"

-- First attempt

-- Return characters upto the intended word along
-- with the word itself
upto word = do
pre <- manyTill anyChar $ lookAhead $ string (toString word)
word' <- try $ string (toString word)
return [Rest pre, Word word]

-- Parsers for the interesting words
parsers = [upto Quick,
upto The,
upto Fox]

-- Try each parser and return its results with the
-- rest of the input.
-- An incorrect result is produced because "choice"
-- picks the first successful parse result.
wordParser = do
snippets <- many $ try $ choice parsers
leftOver <- many anyChar
return $ concat $ snippets ++ [[Rest leftOver]]

-- [Rest "the ",Word Quick,Rest " brown fox jumped over the lazy dog"]
test1 = parseTest wordParser testCase

-- Correct

-- In addition to the characters leading upto the
-- word and the word, the position is also returned
upto' word = do
result <- upto word
pos <- getPosition
return (pos, result)

-- The new parsers
parsers' = [upto' Quick,
upto' The,
upto' Fox]

-- Try each of the given parsers and
-- possibly returning the results and
-- the parser but don't consume
-- input.
tryAll = mapM (\p -> do
r <- optionMaybe $ try (lookAhead p)
case r of
Just result -> return $ Just (p, result)
Nothing -> return $ Nothing
)

-- Pick the parser that has consumed the least.
firstSuccess ps = do
successes <- tryAll ps >>= return . catMaybes
if not (null successes) then
return $ Just (fst $ head (sortBy (comparing (\(_,(pos,_)) -> pos)) successes))
else return $ Nothing

-- Return the parse results for the parser that
-- has consumed the least
wordParser' = do
parser <- firstSuccess parsers'
case parser of
Just p -> do
(_,snippet) <- p
return snippet
Nothing -> parserZero

-- Returns the right result
test2 = parseTest (many wordParser' >>= return . concat) testCase

第一次尝试“test1”不会产生所需的输出,因为“choice”返回第一个成功的解析器,而我真正想要的是第一个在消耗最少字符的情况下成功的解析器。这就是我接下来尝试的方法,即保留解析输入后的源位置并使用具有最低源位置的解析器。

这种情况似乎很常见,我觉得我错过了一些明显的组合符咒语。谁能提供更好的建议吗?

谢谢!

-深

最佳答案

这不是一个特别常见的需求,但这里有一个实现:

import Control.Monad
import "parsec3" Text.Parsec
import Data.Maybe
import Data.List
import Data.Ord

longestParse :: [Parsec String () a] -> Parsec String () a
longestParse parsers = do
allParses <- sequence [lookAhead $ optionMaybe $ try $
liftM2 (,) parse getPosition | parse <- parsers]
-- allParses :: [Maybe (a, SourcePos)]
(bestParse, bestPos) <- case catMaybes allParses of
[] -> fail "No valid parse" -- maybe we can do something better?
successfulParses -> return $ minimumBy (comparing snd) successfulParses
setPosition bestPos
return bestParse

关于haskell - 在单个输入上从多个正确的解析器中进行选择,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9232101/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com