hadoop - 无法从 “hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576”读取数据-6ren

hadoop - 无法从 “hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576”读取数据

转载作者：行者123 更新时间：2023-12-02 22:03:04

test = LOAD 'hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576' USING TextLoader AS (line:chararray);
log = FOREACH test GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] "(.+?)" (\\S+) (\\S+) "([^"]*)" "([^"]*)"')) AS (address_ip: chararray, logname: chararray, user: chararray, timestamp: chararray, req_line: chararray, status: int, bytes: int, referer: chararray, userAgent: chararray);
STORE log INTO 'hbase://Access_Logs' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:address_ip, cf:logname, cf:user, cf:timestamp, cf:req_line, cf:status, cf:bytes, cf:referer, cf:userAgent');





2018-03-10 10:52:03,636 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,840 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,840 [main] INFO  org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for Access_Logs
2018-03-10 10:52:03,843 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2018-03-10 10:52:03,859 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,860 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2018-03-10 10:52:03,860 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2018-03-10 10:52:03,890 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2018-03-10 10:52:03,890 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,891 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2018-03-10 10:52:03,891 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2018-03-10 10:52:03,897 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:03,898 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /192.168.1.195:8050
2018-03-10 10:52:03,899 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2018-03-10 10:52:03,899 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2018-03-10 10:52:03,900 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2018-03-10 10:52:03,981 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.15.0-core-h2.jar to DistributedCache through /tmp/temp1710369540/tmp1282565307/pig-0.15.0-core-h2.jar
2018-03-10 10:52:04,013 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/htrace-core-2.04.jar to DistributedCache through /tmp/temp1710369540/tmp520067094/htrace-core-2.04.jar
2018-03-10 10:52:04,067 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp1710369540/tmp946538428/guava-11.0.2.jar
2018-03-10 10:52:04,123 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-common-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp468949353/hbase-common-0.98.8-hadoop2.jar
2018-03-10 10:52:04,144 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-hadoop-compat-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp113887319/hbase-hadoop-compat-0.98.8-hadoop2.jar
2018-03-10 10:52:04,200 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-server-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp682998180/hbase-server-0.98.8-hadoop2.jar
2018-03-10 10:52:04,256 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-client-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp-1958170360/hbase-client-0.98.8-hadoop2.jar
2018-03-10 10:52:04,317 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/hbase-protocol-0.98.8-hadoop2.jar to DistributedCache through /tmp/temp1710369540/tmp-892814021/hbase-protocol-0.98.8-hadoop2.jar
2018-03-10 10:52:04,363 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.5.jar to DistributedCache through /tmp/temp1710369540/tmp-830858682/zookeeper-3.4.5.jar
2018-03-10 10:52:04,396 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar to DistributedCache through /tmp/temp1710369540/tmp-420530468/protobuf-java-2.5.0.jar
2018-03-10 10:52:04,432 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hbase/lib/high-scale-lib-1.1.1.jar to DistributedCache through /tmp/temp1710369540/tmp-1046507224/high-scale-lib-1.1.1.jar
2018-03-10 10:52:04,474 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar to DistributedCache through /tmp/temp1710369540/tmp309001480/netty-3.6.2.Final.jar
2018-03-10 10:52:04,489 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1710369540/tmp964502237/automaton-1.11-8.jar
2018-03-10 10:52:04,507 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1710369540/tmp-1680308848/antlr-runtime-3.4.jar
2018-03-10 10:52:04,528 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/lib/joda-time-2.5.jar to DistributedCache through /tmp/temp1710369540/tmp813805284/joda-time-2.5.jar
2018-03-10 10:52:04,539 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2018-03-10 10:52:04,544 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2018-03-10 10:52:04,544 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2018-03-10 10:52:04,544 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2018-03-10 10:52:04,555 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2018-03-10 10:52:04,557 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /192.168.1.195:8050
2018-03-10 10:52:04,596 [JobControl] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2018-03-10 10:52:04,597 [JobControl] INFO  org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for Access_Logs
2018-03-10 10:52:04,625 [JobControl] WARN  org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2018-03-10 10:52:04,742 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2018-03-10 10:52:04,742 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2018-03-10 10:52:04,748 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2018-03-10 10:52:04,786 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2018-03-10 10:52:04,828 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1519576410629_0017
2018-03-10 10:52:04,832 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2018-03-10 10:52:04,908 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1519576410629_0017
2018-03-10 10:52:04,910 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://server2.linux.com:8088/proxy/application_1519576410629_0017/
2018-03-10 10:52:05,056 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1519576410629_0017
2018-03-10 10:52:05,056 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases log,test
2018-03-10 10:52:05,056 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: test[16,7],log[-1,-1] C:  R: 
2018-03-10 10:52:05,064 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2018-03-10 10:52:05,064 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1519576410629_0017]
2018-03-10 10:53:02,248 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2018-03-10 10:53:02,249 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1519576410629_0017]
2018-03-10 10:53:05,259 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2018-03-10 10:53:05,259 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1519576410629_0017 has failed! Stop running all dependent jobs
2018-03-10 10:53:05,259 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2018-03-10 10:53:05,260 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /192.168.1.195:8050
2018-03-10 10:53:05,274 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server
2018-03-10 10:53:05,789 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
2018-03-10 10:53:05,789 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2018-03-10 10:53:05,789 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
2.4.1   0.15.0  hadoop  2018-03-10 10:52:03 2018-03-10 10:53:05 UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_1519576410629_0017  log,test    MAP_ONLY    Message: Job failed!    hbase://Access_Logs,

Input(s):
Failed to read data from "hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576"

Output(s):
Failed to produce result in "hbase://Access_Logs"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1519576410629_0017


2018-03-10 10:53:05,789 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!

但是做的时候很好
转储日志

最佳答案

PIG脚本可能无法加载status: int, bytes: int列的数据。

错误说

java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer

这意味着，当PIG期望它是 String时，REGEX解析器将带来 Integer数据。

要进行调试，请尝试在 PIG命令中更改数据类型，然后尝试仅打印输出。完成所有设置后，您可以尝试保存到 hbase中。

关于hadoop - 无法从 “hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576”读取数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49205897/

文章推荐： hadoop - 大数据分析模拟

文章推荐： hadoop - 名称节点和数据节点之间的hdfs文件系统差异

文章推荐： Javascript - 在新请求中使用先前请求中的数据

flume - 如何安装和配置Apache Flume？
Closed. This question does not meet Stack Overflow guidelines。它当前不接受答案。想要改善这个问题吗？更新问题，以便将其作为on-topi
hadoop - Flume-NG:如何使用 Flume-ng 自动读取目录中新添加的文件(Flume 代理的来源)
spooldir 选项用于流式传输特定目录的所有文件。完成整个目录读取后，作业将暂停/停止。但是，如果我想将新文件添加到同一目录中，会发生什么？？我的要求是在任何新文件添加到该特定 spooldir
java - Apache Flume/var/log/flume-ng/flume.log(权限被拒绝)
我正在尝试从/home/cloudera/Documents/flume/读取日志文件并使用 apache flume 将其写入 hdfs。我使用以下命令在 hdfs 中创建 flumeLogTest
flume-ng - Flume 不使用 Hadoop 2.5 cdh5.3 使用 Flume-ng 处理来自 Twitter 源的关键字
我正在尝试使用 MemChannel 和 HDFS 处理一些 Twitter 关键字。但是，在控制台上的 HDFS 启动状态后，flume-ng 没有显示进一步的进度。这是/etc/flume-n
flume - Flume-ng null 事件的自定义接收器
我正在尝试为flume-ng编写一个自定义接收器。我查看了现有的接收器和文档并对其进行了编码。但是，应该接收事件的“process()”方法总是以 null 结束。我正在做 Event event
flume - 如何使用 Flume NG 在控制台上收集日志？
我正在测试 Flume NG (1.2.0) 以收集日志。 Flume收集日志文件flume_test.log的简单测试并将收集到的日志作为 sysout 打印到控制台。 conf/flume.con
flume - 在 flume.conf 中获取变量
我在 flume.con 文件中声明了一个 flume agent。来源是 RabbitMQ，尽管这不是很相关。问题是我需要从那里取出凭证到另一个文件。我看到这样做的方法是在 flume-env.sh
hadoop - Flume - 整个文件可以被视为 Flume 中的事件吗？
我有一个用例，我需要将文件从目录提取到 HDFS。作为 POC，我在 Flume 中使用了简单的目录假脱机，我在其中指定了源、接收器和 channel ，它工作正常。缺点是我必须为进入不同文件夹的多种
flume-ng - 在 Apache Flume 中传输文件时如何保留文件名？
我正在使用 Flume 1.3.1 ng，我正在将文件从 spoolDir 传输到 HDFS Sink，并且我需要与输入文件相同的输出文件名称。例如，如果输入文件名为sample.gz，则输出也需要为
flume-ng - Flume 1.6 kafka源码
kafka_2.10-0.8.2.0 水槽1.6 这是我的水槽配置: a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1
flume - Flume 事件 header 中的预期时间戳，但它为空
我正在使用以下配置详细信息使用 Flume 将 Twitter 提要推送到 HDFS，但在 Flume 事件 header 中获得预期时间戳，但它为空 twitter.conf TwitterAgen
flume - 使用 Flume(spool 目录)将大文件加载到 hdfs
我们将一个 150 mb 的 csv 文件复制到水槽的 spool 目录中，当它被加载到 hdfs 中时，该文件被拆分成更小的文件，例如 80 kb 的文件。有没有办法加载文件而不会使用水槽拆分成更小
hadoop - 每小时将推文保存到单个 Flume 数据文件的 flume.conf 参数应该是多少？
我们将推文保存在目录顺序中，例如/user/flume/2016/06/28/13/FlumeData...。但每小时它会创建超过 100 个 FlumeData 文件。我更改了 TwitterAge
hadoop - 如何在故障转移模式下配置 Flume 1.x (flume-ng)？
有大量关于在 CDH3 中以故障转移模式配置 Flume (0,9x) 节点的信息。但是CDH4中Flume(1.x)配置的配置格式完全不同。如何在故障转移模式下配置 Flume 1.x (flum
hadoop - Flume--找不到主类 : org. apache.flume.tools.GetJavaProperty
我正在使用 cloudera CDH 4.4。当我运行 flume cmd 时 - "bin/flume-ng agent -n agentA -f conf/MultipleFlumes.prope
Flume -flume.root.logger=DEBUG,console 只记录 INFO 级别的日志语句
我在 CentOS(cloudera VM)中安装了 Flume 1.4.0-cdh4.7.0 我运行以下命令来启动水槽 Flume-ng agent -n agent-name -c conf -f
java - FLUME [HADOOP_ORG.APACHE.FLUME.TOOLS.GETJAVAPROPERTY_USER : Bad substitution]
我正在尝试运行典型的 Flume 第一个示例来获取推文并使用 Apache FLume 将它们存储在 HDFS 中。 [Hadoop version 3.1.3; Apache Flume 1.9.0
linux - 异常如下-org.apache.flume.FlumeException : Unable to load source type: com. cloudera.flume
我正在尝试使用 Flume 进行 Twitter 分析。为了从 twitter 获取推文，我在 flume.conf 文件中设置了所有必需的参数(consumerKey、consumerSecret、
linux - 异常(exception)如下。 org.apache.flume.FlumeException : Unable to load source type in flume twitter analysis 异常
我正在尝试使用 Flume 和 Hive 进行 Twitter 分析。为了从 twitter 获取推文，我在 flume.conf 文件中设置了所有必需的参数(consumerKey、consumer
hadoop - 错误 : Could not find or load main class org. apache.flume.node.Application - 在 hadoop 版本 1.2.1 上安装 flume
我搭建了一个hadoop集群，其中一个是master-slave节点，另一个是slave。现在，我想建立一个水槽来获取主机上集群的所有日志。但是，当我尝试从 tarball 安装 flume 时，我总

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

hadoop - 无法从 “hdfs://192.168.1.195:9000/vivek/flume_data/flume.1520589885576”读取数据