gpt4 book ai didi

ruby-on-rails - 如何在我的抓取中处理 "404 errors"?

转载 作者:太空宇宙 更新时间:2023-11-03 16:53:33 24 4
gpt4 key购买 nike

我不是程序员,对 Ruby 语言知之甚少。我有一个从网站获取产品信息的抓取程序,我正在尝试添加一个救援代码来处理 HTTP 404 错误,因此它不会结束抓取,而是继续下一个产品。

我需要将救援添加到下面的代码中:

        def initialize(id, log = nil, timeout_threshold = nil)

@log_buffer = nil
# prepare internal logging stream
@log = (!log.nil? and log.is_a?(Logger)) ? log : Logger.new(@log_buffer=StringIO.new)

begin
# store instance url address
@id = id.to_s
@url = Link::base_url + 'en-US/item_' + @id + '.htm'

# set remote timeout threshold
@timeout_threshold = (timeout_threshold.to_i > 0) ? timeout_threshold.to_i : 15
@timeout = false

@expired = false

if url_verify
Timeout::timeout(@timeout_threshold) {
Mechanize.html_parser = Nokogiri::HTML
@@agent = Agent.instance

###TODO: [optional?] login
###TODO: [optional?] or login iff pricing not present?
###TODO: Agent.get(user login page)
###TODO: Agent.fill in user/pswd
###TODO: Agent.submit

@html = @@agent.get(url)
@log.info("Alamode Product #{@id.to_s}: Load #{url.to_s}")

@specification = parse_specifications
@quantity, @mapped_quantity = parse_quantities
@price = parse_price
@valid = true

# check parsed page
if @specification.size.zero? and @quantity.size.zero?
@valid = false
@expired = true
@log.warn("Alamode Product #{@id.to_s}: #{url.to_s} unscrappable (product no longer available?)")
else
@log.info("stAlamode Product #{@id.to_s}: #{url.to_s} successfully parsed")
@log.info(" QTY #{@mapped_quantity.to_s}")
end
}

else
# return error message
@valid = false
@log.error("Alamode Product #{@id.to_s}: #{url.to_s} is not a properly formatted URI address")
end

rescue Timeout::Error
@valid = false
@timeout = true
@log.error("Alamode Product #{@id.to_s}: #{url.to_s} did not respond within allocated time")
end

end

最佳答案

Ruby 允许您堆叠救援子句。

begin
...
rescue YourErrorName
...
rescue Timeout::Error
...
end

在 new 子句中,您可以安静地退出(什么也不做 - 最好记录结果)或使用下一个 ID 开始报废。我不熟悉 Nokogiri,所以你必须自己找出错误名称;)祝你好运!

关于ruby-on-rails - 如何在我的抓取中处理 "404 errors"?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15577885/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com