gpt4 book ai didi

hadoop - NUTCH:如何使 take.screenshot 和 screenshot.location 属性起作用?

转载 作者:行者123 更新时间:2023-12-02 20:43:39 27 4
gpt4 key购买 nike

一周以来我一直在学习 Nutch(版本 Nutch-1.14),并且在本地模式以及 Hadoop-2.7.2(伪分布式模式)下工作正常。今天我在 nutch-site.xml 中遇到了“take.screenshot”、“screenshot.location”属性,修改这些属性后,nutch 正在抓取种子 url,但在本地模式和 Hadoop 中没有截图。

本地模式的 nutch-site.xml 设置

<property>
<name>take.screenshot</name>
<value>true</value>
<description>
Boolean property determining whether the protocol-htmlunit
WebDriver should capture a screenshot of the URL. If set to
true remember to define the 'screenshot.location'
property as this determines the location screenshots should be
persisted to on HDFS. If that property is not set, screenshots
are simply discarded.
</description>
</property>

<property>
<name>screenshot.location</name>
<value>/home/user/nutch-1.14/screenshot</value>
<description>
The location on disk where a URL screenshot should be saved
to if the 'take.screenshot' property is set to true.
By default this is null, in this case screenshots held in memory
are simply discarded.
</description>
</property>

Hadoop 的 nutch-site.xml 设置
<property>
<name>take.screenshot</name>
<value>true</value>
</property>

<property>
<name>screenshot.location</name>
<value>/screenshot</value>
</property>

备注 HDFS 中存在“screenshot”目录

最佳答案

您是否启用了 protocol-selenium ?基本上,这只适用于这个协议(protocol),默认情况下 Nutch 使用 protocol-http不支持此选项的插件,即使您在配置中启用了这些设置。

关于hadoop - NUTCH:如何使 take.screenshot 和 screenshot.location 属性起作用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48915154/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com