gpt4 book ai didi

java - 如果我更新 url 过滤器文本,我需要从命令行调用什么 Nutch 命令

转载 作者:行者123 更新时间:2023-12-01 12:36:30 25 4
gpt4 key购买 nike

Nutch 大师,

如果我更改 robots.txtregex-urlfilter.txt 等文件以及任何此类资源,我需要调用哪个命令?

我从坚果的说明中不确定。我猜这是解析器的工作,但我不确定。

卡提克

根据说明

# echo " crawl one-step crawler for intranets"
echo " inject inject new urls into the database"
echo " hostinject creates or updates an existing host table from a text file"
echo " generate generate new batches to fetch from crawl db"
echo " fetch fetch URLs marked during generate"
echo " parse parse URLs marked during fetch"
echo " updatedb update web table after parsing"
echo " updatehostdb update host table after parsing"
echo " readdb read/dump records from page database"
echo " readhostdb display entries from the hostDB"
echo " elasticindex run the elasticsearch indexer"
echo " solrindex run the solr indexer on parsed batches"
echo " solrdedup remove duplicates from solr"
echo " parsechecker check the parser for a given url"
echo " indexchecker check the indexing filters for a given url"
echo " plugin load a plugin and run one of its classes main()"
echo " nutchserver run a (local) Nutch server on a user defined port"
echo " junit runs the given JUnit test"
echo " or"
echo " CLASSNAME run the class named CLASSNAME"
echo "Most commands print help when invoked w/o parameters."

最佳答案

如果更改 regex-urlfilter.txt 文件,则需要更新 nutch 作业文件。这可以这样做:

jar -uvf /usr/local/nutch-1.2/nutch-1.2.job <path to regex-urlfilter.txt>

关于java - 如果我更新 url 过滤器文本,我需要从命令行调用什么 Nutch 命令,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25538609/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com