gpt4 book ai didi

haskell - 使用 Haskell 进行网页抓取

转载 作者:行者123 更新时间:2023-12-03 06:02:47 35 4
gpt4 key购买 nike

使用 Haskell 抓取网站的库的现状如何?

我试图让自己在 Haskell 中完成更多的快速一次性任务,以帮助提高我对这门语言的舒适度。

在Python中,我倾向于使用优秀的PyQuery为此的图书馆。 Haskell 中是否有同样简单易用的东西?我研究过 Tag Soup,虽然解析器本身看起来不错,但实际上遍历页面似乎并不像其他语言那样好。

还有更好的选择吗?

最佳答案

http://hackage.haskell.org/package/shpider

Shpider is a web automation library for Haskell. It allows you to quickly write crawlers, and for simple cases ( like following links ) even without reading the page source.

It has useful features such as turning relative links from a page into absolute links, options to authorize transactions only on a given domain, and the option to only download html documents.

It also provides a nice syntax for filling out forms.

An example:

 runShpider $ do
download "http://apage.com"
theForm : _ <- getFormsByAction "http://anotherpage.com"
sendForm $ fillOutForm theForm $ pairs $ do
"occupation" =: "unemployed Haskell programmer"
"location" =: "mother's house"

(2018 年编辑 - shpider 已弃用,这些天 https://hackage.haskell.org/package/scalpel 可能是一个很好的替代品)

关于haskell - 使用 Haskell 进行网页抓取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4838138/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com