- android - RelativeLayout 背景可绘制重叠内容
- android - 如何链接 cpufeatures lib 以获取 native android 库?
- java - OnItemClickListener 不起作用,但 OnLongItemClickListener 在自定义 ListView 中起作用
- java - Android 文件转字符串
请大家帮帮我我正在尝试使用 NUTCH 抓取网站,但它给我错误“java.io.IOException: Job failed!
”
我正在运行此命令“bin/nutch solrindex http://<host name>:8080/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*
”并且我正在使用 NUTCH 1.5.1 和 SOLR 3.6.1 以及 jdk java-7-openjdk-i386 和 ubuntu 12.04。
在 hadoop.log 存在于 NUTCH/log 文件夹中显示以下内容:
2012-09-13 12:56:10,524 INFO solr.SolrIndexer - SolrIndexer: starting at 2012-09-13 12:56:10
2012-09-13 12:56:10,604 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb
2012-09-13 12:56:10,604 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb
2012-09-13 12:56:10,604 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20120910160403
2012-09-13 12:56:10,711 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20120910160448
2012-09-13 12:56:10,715 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20120910160631
2012-09-13 12:56:10,760 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2012-09-13 12:56:11,212 INFO plugin.PluginRepository - Plugins: looking in: /home/zapbuild/Nutch/plugins
2012-09-13 12:56:11,310 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true]
2012-09-13 12:56:11,310 INFO plugin.PluginRepository - Registered Plugins:
2012-09-13 12:56:11,310 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints)
2012-09-13 12:56:11,310 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex)
2012-09-13 12:56:11,310 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml)
2012-09-13 12:56:11,310 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic)
2012-09-13 12:56:11,310 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic)
2012-09-13 12:56:11,310 INFO plugin.PluginRepository - Tika Parser Plug-in (parse-tika)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Anchor Indexing Filter (index-anchor)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - HTTP Framework (lib-http)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Registered Extension-Points:
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Nutch Segment Merge Filter (org.apache.nutch.segment.SegmentMergeFilter)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser)
2012-09-13 12:56:11,311 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
2012-09-13 12:56:11,313 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:11,314 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:11,314 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:14,104 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:14,104 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:14,104 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:17,135 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:17,136 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:17,136 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:20,204 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:20,205 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:20,205 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:23,297 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:23,297 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:23,297 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:26,232 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:26,232 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:26,233 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:29,252 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:29,252 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:29,252 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:32,284 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:32,284 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:32,284 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:35,258 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:35,258 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:35,258 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:38,283 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:38,284 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:38,284 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:41,278 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:41,278 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:41,278 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:44,334 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:44,334 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:44,334 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:47,338 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:47,338 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:47,338 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:50,360 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:50,360 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:50,360 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:53,309 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter
2012-09-13 12:56:53,310 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off
2012-09-13 12:56:53,310 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: content dest: content
2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: title dest: title
2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: host dest: host
2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: segment dest: segment
2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: boost dest: boost
2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: digest dest: digest
2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: url dest: id
2012-09-13 12:56:53,357 INFO solr.SolrMappingReader - source: url dest: url
2012-09-13 12:56:53,409 INFO solr.SolrWriter - Indexing 18 documents
2012-09-13 12:56:53,604 WARN mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Missing solr core name in path
Missing solr core name in path
request: http://<host name>:8983/solr/update?wt=javabin&version=2
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:142)
at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:466)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:530)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
2012-09-13 12:56:53,981 ERROR solr.SolrIndexer - java.io.IOException: Job failed!
在 Solr 中我没有找到任何日志文件。
请帮我解决我真正遇到的问题。
最佳答案
您的日志说明了问题所在: Missing solr core name in path
您的请求应在 /solr/
之间具有 Solr 核心名称和 /update?wt=...
是这样的: http://<host name>:8983/solr/<core_name>/update?wt=javabin&version=2
也许您应该将核心名称添加到您的 nutch 命令 URL 中
关于java - org.apache.solr.common.SolrException : Bad Request Bad Request request: http://localhost:8080/solr/update? wt=javabin&version=2,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12369644/
这两个句子有什么区别: res = requests.request('POST', url) 和 res = requests.request.post(url) 最佳答案 它们几乎是一样的:htt
我正在使用“请求对话框”来创建 Facebook 请求。为了让用户收到请求,我需要使用图形 API 访问 Request 对象。我已经尝试了大多数看起来合适的权限设置(read_requests 和
urllib.request和http.client都是python标准库。前者相关方法的文档是 here后者,here (我使用的是3.5) 有谁知道为什么标准库中有两种方法看起来做同样的事情,或者
我是 Twisted 的新手,我不明白为什么在运行我的脚本时会出现此错误。\ 基本上,该脚本由 2 个页面组成,第一个页面是一个 HTML 表单,它调用自身执行一个阻塞方法并显示结果。当请求同时发送到
我有一个客户端 JS 文件,其中包含: agent = require('superagent'); request = agent.get(url); 然后我有类似的东西 request.get(u
提前输入功能可以正常工作。但问题是,提前输入功能会在每个数据请求上发出 JSON 请求,而实际上只应针对一个特定请求发生。 我有以下 Controller : #controllers/agencie
我正在使用 Rust 开发一个小型 API,我不确定如何在两个地方访问来自 Iron 的 Request。 Authentication 中间件为 token 读取一次Request,如果路径被允许(
问题起因 今天一位网友向我们反馈,用Chrome打开某些博客文章时,会出现"Bad Request - Request Too Long. HTTP Error 400. The siz
当我从 LinkedIn 向 https://api.linkedin.com/uas/oauth/requestToken 请求请求 token 时,出现以下错误: oauth_problem=si
我只是想使用 okhttp 下载一些字节数据,但在我完成代码之前,我遇到了一个问题,android studio 报告了一个错误,说“Request(okhttp3.Request.Builder)
我正在使用 Windows 10。我想在我的系统上使用 Angular 4。当我运行 node -v 和 npm -v 时,它会显示版本。但是当我执行语句 npm install -g @angula
我正在尝试让一个简单的 Iron 示例起作用: extern crate iron; extern crate router; use iron::prelude::*; use iron::stat
我正在尝试使用嵌套字典“动态”创建一个数据输入表单(目前,我使用具有 3 个值的数组,但将来数组中的元素数量可能会有所不同)。这似乎工作正常,并且表单“正确”渲染了 html 模板(正确 = 我看到了
从 ASP.NET 中的代码隐藏访问表单或查询字符串值时,使用的优缺点是什么,例如: // short way string p = Request["param"]; 代替: // long way
我遇到了一个问题,我想知道更好的解决方法。 有五个 api 请求并行运行,第二个请求依赖于第四个请求的响应,但所有 5 个请求都已在运行。什么是更好的方法? 需要建议。提前致谢。 最佳答案 调度地面工
我收到以下错误:TypeError:序列项 0:预期字节、字节数组或具有缓冲区接口(interface)的对象、找到元组 我检查了Python文档,urllib.request.Request的参数似
当我向函数添加超时参数时,我的代码总是进入异常并打印出“我失败了”。当我删除超时参数时,代码会正常工作,并进入 try 子句。关于超时参数如何在 urllib.request 函数中工作的任何信息?
我使用 cURL 向服务器发送请求这是链接:Server Side script for cURL request我用 file_get_contents('php://input'); 读取发送的数
请大家帮帮我我正在尝试使用 NUTCH 抓取网站,但它给我错误“java.io.IOException: Job failed!” 我正在运行此命令“bin/nutch solrindex http:
在我的 AngularJS 应用程序中,我无法弄清楚如何对 then promise 的执行更改 location.url 进行单元测试。我有一个函数,登录 ,调用服务,身份验证服务 .它返回 pro
我是一名优秀的程序员,十分优秀!