gpt4 book ai didi

xml - 从 xml-conduit 获取所有名称

转载 作者:数据小太阳 更新时间:2023-10-29 01:49:58 26 4
gpt4 key购买 nike

我正在解析来自 http://hackage.haskell.org/package/xml-conduit-1.1.0.9/docs/Text-XML-Stream-Parse.html 的修改后的 XML

这是它的样子:

<?xml version="1.0" encoding="utf-8"?>
<population xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://example.com">
<success>true</success>
<row_count>2</row_count>
<summary>
<bananas>0</bananas>
</summary>
<people>
<person>
<firstname>Michael</firstname>
<age>25</age>
</person>
<person>
<firstname>Eliezer</firstname>
<age>2</age>
</person>
</people>
</population>

如何获取每个人的名字年龄列表?

我的目标是使用 http-conduit 下载此 xml 然后解析它,但我正在寻找一种解决方案,说明在没有属性时如何解析(使用 tagNoAttrs?)

这是我尝试过的方法,我在 Haskell 评论中添加了我的问题:

{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.Trans.Resource
import Data.Conduit (($$))
import Data.Text (Text, unpack)
import Text.XML.Stream.Parse
import Control.Applicative ((<*))

data Person = Person Int Text
deriving Show

-- Do I need to change the lambda function \age to something else to get both name and age?
parsePerson = tagNoAttr "person" $ \age -> do
name <- content -- How do I get age from the content? "unpack" is for attributes
return $ Person age name

parsePeople = tagNoAttr "people" $ many parsePerson

-- This doesn't ignore the xmlns attributes
parsePopulation = tagName "population" (optionalAttr "xmlns" <* ignoreAttrs) $ parsePeople

main = do
people <- runResourceT $
parseFile def "people2.xml" $$ parsePopulation
print people

最佳答案

首先:xml-conduit 中的解析组合器已经有一段时间没有更新了,显示它们的年龄。我建议大多数人改用 DOM 或游标界面。也就是说,让我们看看你的例子。您的代码有两个问题:

  • 它不能正确处理 XML 命名空间。所有元素名称都在 http://example.com 命名空间中,您的代码需要反射(reflect)这一点。
  • 解析组合器要求您考虑所有 元素。他们不会自动为您跳过某些元素。

所以这是一个使用流式 API 的实现,它获得了期望的结果:

{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.Trans.Resource (runResourceT)
import Data.Conduit (Consumer, ($$))
import Data.Text (Text)
import Data.Text.Read (decimal)
import Data.XML.Types (Event)
import Text.XML.Stream.Parse

data Person = Person Int Text
deriving Show

-- Do I need to change the lambda function \age to something else to get both name and age?
parsePerson :: MonadThrow m => Consumer Event m (Maybe Person)
parsePerson = tagNoAttr "{http://example.com}person" $ do
name <- force "firstname tag missing" $ tagNoAttr "{http://example.com}firstname" content
ageText <- force "age tag missing" $ tagNoAttr "{http://example.com}age" content
case decimal ageText of
Right (age, "") -> return $ Person age name
_ -> force "invalid age value" $ return Nothing

parsePeople :: MonadThrow m => Consumer Event m [Person]
parsePeople = force "no people tag" $ do
_ <- tagNoAttr "{http://example.com}success" content
_ <- tagNoAttr "{http://example.com}row_count" content
_ <- tagNoAttr "{http://example.com}summary" $
tagNoAttr "{http://example.com}bananas" content
tagNoAttr "{http://example.com}people" $ many parsePerson

-- This doesn't ignore the xmlns attributes
parsePopulation :: MonadThrow m => Consumer Event m [Person]
parsePopulation = force "population tag missing" $
tagName "{http://example.com}population" ignoreAttrs $ \() -> parsePeople

main :: IO ()
main = do
people <- runResourceT $
parseFile def "people2.xml" $$ parsePopulation
print people

这是一个使用游标 API 的示例。请注意,它具有不同的错误处理特性,但对于格式正确的输入应该产生相同的结果。

{-# LANGUAGE OverloadedStrings #-}
import Text.XML
import Text.XML.Cursor
import Data.Text (Text)
import Data.Text.Read (decimal)
import Data.Monoid (mconcat)

main :: IO ()
main = do
doc <- Text.XML.readFile def "people2.xml"
let cursor = fromDocument doc
print $ cursor $// element "{http://example.com}person" >=> parsePerson

data Person = Person Int Text
deriving Show

parsePerson :: Cursor -> [Person]
parsePerson c = do
let name = c $/ element "{http://example.com}firstname" &/ content
ageText = c $/ element "{http://example.com}age" &/ content
case decimal $ mconcat ageText of
Right (age, "") -> [Person age $ mconcat name]
_ -> []

关于xml - 从 xml-conduit 获取所有名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22748303/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com