gpt4 book ai didi

haskell - 解析 attoparsec 中不以某些字符结尾的标识符

转载 作者:行者123 更新时间:2023-12-02 10:11:58 24 4
gpt4 key购买 nike

我一直在编写一个 attoparsec 解析器来解析 Uniform Code for Units of Measure 的内容。调用<ATOM-SYMBOL> 。它被定义为某个类(该类包括所有数字 0-9)中不以数字结尾的最长字符序列。

因此输入 foo27我要消费并返回foo ,对于237bar26我要消费并返回237bar ,对于19我想在不消耗任何东西的情况下失败。

我不知道如何用 takeWhile1 构建这个或takeTillscan但我可能遗漏了一些明显的东西。

更新:到目前为止,我最好的尝试是设法排除完全是数字的序列

atomSymbol :: Parser Text
atomSymbol = do
r <- core
if (P.all (inClass "0-9") . T.unpack $ r)
then fail "Expected an atom symbol but all characters were digits."
else return r
where
core = A.takeWhile1 $ inClass "!#-'*,0-<>-Z\\^-z|~"

我尝试更改它来测试最后一个字符是否是数字,而不是它们是否都是数字,但它似乎不会一次回溯一个字符。

更新2:

整个文件位于 https://github.com/dmcclean/dimensional-attoparsec/blob/master/src/Numeric/Units/Dimensional/Parsing/Attoparsec.hs 。这仅针对 prefixes 构建来自 https://github.com/dmcclean/dimensional 的分支.

最佳答案

您应该重新表述问题并分别处理数字范围 ( 0-9 ) 和非数字字符范围 ( !#-'*,:-<>-Z\\^-z|~ )。感兴趣的句法元素可以描述为

  • 可选的数字范围,后跟
  • 非数字范围,后跟
  • 零个或多个{数字范围后跟一个非数字范围}。
{-# LANGUAGE OverloadedStrings #-}

module Main where

import Control.Applicative ((<|>), many)
import Data.Char (isDigit)

import Data.Attoparsec.Combinator (option)
import Data.Attoparsec.Text (Parser)
import qualified Data.Attoparsec.Text as A
import Data.Text (Text)
import qualified Data.Text as T

atomSymbol :: Parser Text
atomSymbol = f <$> (option "" digitSpan)
<*> (nonDigitSpan <|> fail errorMsg)
<*> many (g <$> digitSpan <*> nonDigitSpan)
where
nonDigitSpan = A.takeWhile1 $ A.inClass "!#-'*,:-<>-Z\\^-z|~"
digitSpan = A.takeWhile1 isDigit
f x y xss = T.concat $ x : y : concat xss
g x y = [x,y]
errorMsg = "Expected an atom symbol but all characters (if any) were digits."

测试

[...] given the input foo27 I want to consume and return foo, for 237bar26 I want to consume and return 237bar, for 19 I want to fail without consuming anything.

λ> A.parseOnly atomSymbol "foo26"
Right "foo"

λ> A.parseOnly atomSymbol "237bar26"
Right "237bar"

λ> A.parseOnly atomSymbol "19"
Left "Failed reading: Expected an atom symbol but all characters (if any) were digits."

关于haskell - 解析 attoparsec 中不以某些字符结尾的标识符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34081807/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com