gpt4 book ai didi

在 Haskell 中解析化合物

转载 作者:行者123 更新时间:2023-12-02 10:37:13 24 4
gpt4 key购买 nike

我试图制作一个化合物解析器作为自己的练习,但我陷入了困境。

这是我尝试使用的数据类型:

data Compound = Monoatomic String Int | Poliatomic [Compound] Int

给定一个像“Ca(OH)2”这样的字符串,我想要得到类似的东西;

Poliatomic [Monoatomic "Ca" 1, Poliatomic [Monoatomic "O" 1, Monoatomic "H" 1] 2 ] 1

单个原子的单原子类型构造函数,以及多个原子的多原子类型构造函数。在此示例中,(OH)2 表示内部多原子结构,并表示为多原子[(单原子O 1),(单原子H 1 )] 2 。数字 2 代表我们有两个这样的多原子结构。

我做了这么多;

import Data.Char (isUpper)
data Compound = Monoatomic String Int | Poliatomic [Compound] Int

instance Functor Compound where
fmap f (Monoatomic s i) = Monoatomic (f s) i
fmap f (Poliatomic xs i) = Poliatomic (fmap f xs) i

-- Change number of a compound
changeNumber :: Compound -> Int -> Compound
changeNumber (Monoatomic xs _) n = Monoatomic xs n
changeNumber (Poliatomic xs _) n = Poliatomic xs n

-- Take a partial compound and next chracter return partial compound
parseCompound :: Compound -> Char -> Compound
parseCompound (Poliatomic x:xs n) c
| isUpper c = Poliatomic ((Monoatomic [c] 1):x:xs) n -- add new atom to compound
| isLower c = Poliatomic

-- I want to do foldl parseCompound (Poliatomic [] 1) inputstring

但是事情变得太复杂了,我无法继续。

看起来这应该是一个相当简单的问题,但我对 Haskell 很陌生,不知道如何完成这个功能。

我有这个问题:

  • 到目前为止我的方法正确吗?
  • 我怎样才能完成这项工作?

最佳答案

我已经用 Parsec 创建了您正在寻找的解析器让您了解秒差距解析器是什么样子,因为您说过您对此缺乏经验。

即使只有很少的 Haskell 经验,它也应该具有相当的可读性。我对需要特别注意的部分提供了一些评论。

import Text.Read (readMaybe)
import Data.Maybe (fromMaybe)
import Text.Parsec (parse, many, many1, digit, char, string, (<|>), choice, try)
import Text.Parsec.String (Parser)


data Compound
= Monoatomic String Int
| Poliatomic [Compound] Int
deriving Show


-- Run the substance parser on "Ca(OH)2" and print the result which is
-- Right (Poliatomic [Monoatomic "Ca" 1,Poliatomic [Monoatomic "O" 1,Monoatomic "H" 1] 2] 1)
main = print (parse substance "" "Ca(OH)2")


-- parse the many parts which make out the top-level polyatomic compound
--
-- "many1" means "at least one"
substance :: Parser Compound
substance = do
topLevel <- many1 part
return (Poliatomic topLevel 1)


-- a single part in a substance is either a poliatomic compound or a monoatomic compound
part :: Parser Compound
part = poliatomic <|> monoatomic


-- a poliatomic compound starts with a '(', then has many parts inside, then
-- ends with ')' and has a number after it which indicates how many of it there
-- are.
poliatomic :: Parser Compound
poliatomic = do
char '('
inner <- many1 part
char ')'
amount <- many1 digit
return (Poliatomic inner (read amount))


-- a monoatomic compound is one of the many element names, followed by an
-- optional digit. if omitted, the amount defaults to 1.
--
-- "try" is a little special, and required in this case. it means "if a parser
-- fails, try the next one from where you started, not from where the last one
-- failed."
--
-- "choice" means "try all parsers in this list, stop when one matches"
--
-- "many" means "zero or more"
monoatomic :: Parser Compound
monoatomic = do
name <- choice [try nameParser | nameParser <- atomstrings]
amount <- many digit
return (Monoatomic name (fromMaybe 1 (readMaybe amount)))


-- a list of parser for atom names. it is IMPORTANT that the longest names
-- come first. the reason for that is that it makes the parser much simpler to
-- write, and it can execute much faster. it's common when designing parsers to
-- consider things like that when creating them.
atomstrings :: [Parser String]
atomstrings = map string (words "He Li Be Ne Na Mg Al Ca H B C N O F")

我尝试以一种至少适合初学者的方式编写此代码,但它可能不是很清楚,所以我很乐意回答有关此问题的任何问题。

<小时/>

上面的解析器就是您想要的。然而,如果我有自由的话,这不是我会写的。如果我必须做我想做的事,我会利用这个事实

Ca(OH)2

可以表示为

(Ca)1((O)1(H)1)2

这是一种更加统一的表示形式,反过来又会产生更简单的数据结构和更少样板的解析器。我更喜欢编写的代码如下所示:

import Text.Read (readMaybe)
import Data.Maybe (fromMaybe)
import Control.Applicative ((<$>), (<*>), pure)
import Text.Parsec (parse, many, many1, digit, char, string, (<|>), choice, try, between)
import Text.Parsec.String (Parser)


data Substance
= Part [Substance] Int
| Atom String
deriving Show


main = print (parse substance "" "Ca(OH)2")
-- Right (Part [Part [Atom "Ca"] 1,Part [Part [Atom "O"] 1,Part [Atom "H"] 1] 2] 1)

substance :: Parser Substance
substance = Part <$> many1 part <*> pure 1

part :: Parser Substance
part = do
inner <- polyatomic <|> monoatomic
amount <- fromMaybe 1 . readMaybe <$> many digit
return (Part inner amount)

polyatomic :: Parser [Substance]
polyatomic = between (char '(') (char ')') (many1 part)

monoatomic :: Parser [Substance]
monoatomic = (:[]) . Atom <$> choice (map (try . string) atomstrings)

atomstrings :: [String]
atomstrings = words "He Li Be Ne Na Mg Al Ca H B C N O F"

这使用了 Haskell 中的一些“高级”技巧(例如 <$><*> 运算符),因此您可能不感兴趣,OP,但我将其用于其他可能感兴趣的人更多高级 Haskell 用户并了解 Parsec。

如您所见,这个解析器只需要大约半页,这就是 Parsec 这样的库的强大之处 - 它们使编写解析器变得既简单又有趣!

关于在 Haskell 中解析化合物,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29778508/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com