gpt4 book ai didi

hadoop - 无法从 Pig Latin 的 Hadoop HDFS 加载文件

转载 作者:可可西里 更新时间:2023-11-01 15:32:28 26 4
gpt4 key购买 nike

我在尝试从文件加载 csv 时遇到问题。我不断收到以下错误:

Input(s):
Failed to read data from "hdfs://localhost:9000/user/der/1987.csv"

Output(s):
Failed to produce result in "hdfs://localhost:9000/user/der/totalmiles3"

查看我安装在本地计算机上的 Hadoop hdfs,我看到了该文件。事实上,该文件位于多个位置,例如/、/user/等。

hdfs dfs -ls /user/der
Found 1 items
-rw-r--r-- 1 der supergroup 127162942 2015-05-28 12:42
/user/der/1987.csv

我的 pig 脚本如下:

records = LOAD '1987.csv' USING PigStorage(',') AS
(Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime,
CRSArrTime, UniqueCarrier, FlightNum, TailNum,ActualElapsedTime,
CRSElapsedTime,AirTime,ArrDelay, DepDelay, Origin, Dest,
Distance:int, TaxIn, TaxiOut, Cancelled,CancellationCode,
Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay,
lateAircraftDelay);
milage_recs= GROUP records ALL;
tot_miles = FOREACH milage_recs GENERATE SUM(records.Distance);
STORE tot_miles INTO 'totalmiles3';

我使用 -x 本地选项运行 pig。我能够使用 -x local 选项从我的本地硬盘读取文件。得到了正确的答案,并且 Hadoop namenode 上的 tail -f 没有滚动,这证明我在本地硬盘上运行了所有文件:

pig  -x local totalmiles.pig

现在我收到错误。似乎 hadoop 名称服务器正在收到请求,因为我使用了 tail -f 并看到日志滚动。

pig totalmiles.pig

records = LOAD '/user/der/1987.csv' USING PigStorage(',') AS

我收到以下错误:

Failed Jobs: 
JobId Alias Feature Message Outputs
job_local602774674_0001 milage_recs,records,tot_miles
GROUP_BY,COMBINER Message: ENOENT: No such file or directory

at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method)

at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)

at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.j
ava:724)
at


org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java: 502)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSys tem.java:600)
at
org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUpl
oader.java:94)
at
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitte
r.java:98)
at org .apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:193)

...blah...

Input(s):
Failed to read data from "/user/der/1987.csv"

Output(s):
Failed to produce result in "hdfs://localhost:9000/user/der/totalmiles3"

我使用 hdfs 通过 mkdir 检查权限,这似乎没问题:

hdfs dfs -mkdir /user/der/temp2 
hdfs dfs -ls /user/der

Found 3 items
-rw-r--r-- 1 der supergroup 127162942 2015-05-28 12:42
/user/der/1987.csv
drwxr-xr-x - der supergroup 0 2015-05-28 16:21
/user/der/temp2
drwxr-xr-x - der supergroup 0 2015-05-28 15:57
/user/der/test

我尝试了带有 mapreduce 选项的 pig ,但仍然得到相同类型的错误:

 pig -x mapreduce totalmiles.pig

5-05-28 20:58:44,608 [JobControl] INFO
org.apache.hadoop.mapreduce.lib.jobc
ontrol.ControlledJob - PigLatin:totalmiles.pig while
submitting

ENOENT: No such file or directory
at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Na at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)
at

org.apache.hadoop.fs.RawLocalFileSystem.setPermissi at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSy
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:600)
at
org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(Job
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(Jo
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobS
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)

我的 core-site.xml 的临时 dir 如下:

<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop</value>
<description>A base for other temporary directories.
</description>
</property>

和我的hdfs-site.xml作为namenodedatanode如下:

 <property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/dfs/namenode</value>
</property>





<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/dfs/datanode</value>
</property>

我在调试问题方面取得了一些进展。看来我的名称节点配置错误,因为我无法重新格式化它:

[ hadoop hdfs formatting gets error failed for Block pool ]

最佳答案

我们必须将 hadoop 文件路径指定为:/user/der/1987.csv

 records = LOAD '/user/der/1987.csv' USING PigStorage(',') AS
(Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime,
CRSArrTime, UniqueCarrier, FlightNum, TailNum,ActualElapsedTime,
CRSElapsedTime,AirTime,ArrDelay, DepDelay, Origin, Dest,
Distance:int, TaxIn, TaxiOut, Cancelled,CancellationCode,
Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay,
lateAircraftDelay);

如果用于测试,您可以在执行 pig 脚本的路径中包含文件:1987.csv,即在同一位置包含 1987.csv 和 .pig 文件。

关于hadoop - 无法从 Pig Latin 的 Hadoop HDFS 加载文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30516226/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com