gpt4 book ai didi

ruby - 更改链接的文本,然后在 Ruby 中使用 Mechanize 单击它们

转载 作者:行者123 更新时间:2023-12-04 16:20:50 25 4
gpt4 key购买 nike

我设法用 Mechanize 填写了一个表格并获得了一个链接列表。
部分结果如下所示:

[
#<Mechanize::Page::Link "View" "/cgi-bin/dcdev/forms/C00508200/800329/">,
#<Mechanize::Page::Link "View" "/cgi-bin/dcdev/forms/C00487363/800634/">,
#<Mechanize::Page::Link "View" "/cgi-bin/dcdev/forms/C00498097/800463/">
]

我一直无法弄清楚接下来会发生什么。
  • 我需要抓取的页面不是那些链接,而是有/sa/ALL在链接的末尾,例如:/cgi-bin/dcdev/forms/C00508200/800329/sa/ALL .如何添加 sa/ALL到每个链接的末尾?
  • 那么,如何单击每个更正的链接并保存结果页面?一个循环?
  • 最佳答案

    这就是你钓鱼的方式......

    require 'nokogiri'

    doc = Nokogiri::HTML(<<EOT)
    <html>
    <body>
    <a href="/cgi-bin/dcdev/forms/C00508200/800329/">
    <a href="/cgi-bin/dcdev/forms/C00487363/800634/">
    <a href="/cgi-bin/dcdev/forms/C00498097/800463/">
    </body>
    </html>
    EOT

    hrefs = doc.search('a').map{ |a| a['href'] + '/sa/ALL' }

    Mechanize 在内部使用 Nokogiri 作为其 HTML 解析器。您可以访问 doc Mechanize 使用类似的东西:
    require 'mechanize'

    agent = Mechanize.new
    page = agent.get('http://www.example.net')

    证明我们正在处理 Nokogiri 文档:
    page.parser.class # => Nokogiri::HTML::Document < Nokogiri::XML::Document

    获取页面中的链接进行操作:
    page.parser.search('a').map(&:to_html)

    返回:
    [
    [ 0] "<a href=\"/\"><img src=\"/_img/iana-logo-pageheader.png\" alt=\"Homepage\"></a>",
    [ 1] "<a href=\"/domains/\">Domains</a>",
    [ 2] "<a href=\"/numbers/\">Numbers</a>",
    [ 3] "<a href=\"/protocols/\">Protocols</a>",
    [ 4] "<a href=\"/about/\">About IANA</a>",
    [ 5] "<a href=\"/go/rfc2606\">RFC 2606</a>",
    [ 6] "<a href=\"/about/\">About</a>",
    [ 7] "<a href=\"/about/presentations/\">Presentations</a>",
    [ 8] "<a href=\"/about/performance/\">Performance</a>",
    [ 9] "<a href=\"/reports/\">Reports</a>",
    [10] "<a href=\"/domains/\">Domains</a>",
    [11] "<a href=\"/domains/root/\">Root Zone</a>",
    [12] "<a href=\"/domains/int/\">.INT</a>",
    [13] "<a href=\"/domains/arpa/\">.ARPA</a>",
    [14] "<a href=\"/domains/idn-tables/\">IDN Repository</a>",
    [15] "<a href=\"/protocols/\">Protocols</a>",
    [16] "<a href=\"/numbers/\">Number Resources</a>",
    [17] "<a href=\"/abuse/\">Abuse Information</a>",
    [18] "<a href=\"http://www.icann.org/\">Internet Corporation for Assigned Names and Numbers</a>",
    [19] "<a href=\"mailto:iana@iana.org?subject=General%20website%20feedback\">iana@iana.org</a>"
    ]

    捕获并加工它们:
    links = page.parser.search('a').map{ |a| a['href'] + 'sa/ALL' }
    [
    [ 0] "/sa/ALL",
    [ 1] "/domains/sa/ALL",
    [ 2] "/numbers/sa/ALL",
    [ 3] "/protocols/sa/ALL",
    [ 4] "/about/sa/ALL",
    [ 5] "/go/rfc2606sa/ALL",
    [ 6] "/about/sa/ALL",
    [ 7] "/about/presentations/sa/ALL",
    [ 8] "/about/performance/sa/ALL",
    [ 9] "/reports/sa/ALL",
    [10] "/domains/sa/ALL",
    [11] "/domains/root/sa/ALL",
    [12] "/domains/int/sa/ALL",
    [13] "/domains/arpa/sa/ALL",
    [14] "/domains/idn-tables/sa/ALL",
    [15] "/protocols/sa/ALL",
    [16] "/numbers/sa/ALL",
    [17] "/abuse/sa/ALL",
    [18] "http://www.icann.org/sa/ALL",
    [19] "mailto:iana@iana.org?subject=General%20website%20feedbacksa/ALL"
    ]

    应用您的调整的链接由您决定,如何重新获取它们是您的练习。

    关于ruby - 更改链接的文本,然后在 Ruby 中使用 Mechanize 单击它们,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12097612/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com