gpt4 book ai didi

hadoop - 如何转储数据?

转载 作者:行者123 更新时间:2023-12-02 21:46:23 24 4
gpt4 key购买 nike

我已经安装了PIG,并通过执行以下操作来加载csv:

grunt> boys = LOAD '/user/pig_input/student-boys.txt' USING PigStorage ('\t') AS (name:chararray,state:chararray,attendance:float);

2014-07-27 11:54:04,414 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-07-27 11:54:04,414 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-07-27 11:54:04,525 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-07-27 11:54:04,525 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

但是,当我尝试转储数据集时,出现以下错误:
grunt> DUMP boys;
2014-07-27 11:54:12,710 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias boys
Details at logfile: /home/hduser/tmp/pig_1406476412229.log

我试图调查为什么会这样,但是我无法理解确切的原因。
grunt> hduser@hadoop:~/tmp$ cat /home/hduser/tmp/pig_1406476412229.log
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias boys

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias boys
at org.apache.pig.PigServer.openIterator(PigServer.java:880)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:541)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:1464)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2175)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2127)
at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:988)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:349)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1482)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1478)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1476)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1258)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:504)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1505)
at org.apache.pig.backend.hadoop.datastorage.HDirectory.create(HDirectory.java:63)
at org.apache.pig.backend.hadoop.datastorage.HPath.create(HPath.java:159)
at org.apache.pig.impl.io.FileLocalizer.getTemporaryPath(FileLocalizer.java:481)
at org.apache.pig.impl.io.FileLocalizer.getTemporaryPath(FileLocalizer.java:474)
at org.apache.pig.PigServer.openIterator(PigServer.java:855)
... 12 more
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:1464)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2175)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2127)
at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:988)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:349)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1482)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1478)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1476)

at org.apache.hadoop.ipc.Client.call(Client.java:1028)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at com.sun.proxy.$Proxy0.mkdirs(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:84)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at com.sun.proxy.$Proxy0.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1256)
... 19 more

我需要帮助来了解为什么会出现这些错误?我需要更改什么才能正确转储加载的数据集?

最佳答案

/user/pig_input/student-boys.txt。

您可以尝试使用以下几行创建文件吗?

boys = LOAD'/user/pig_input/student-boys.txt'使用PigStorage('\ t')AS(名称:字符数组,状态:字符数组,出勤率: float );

DUMP男孩;

然后尝试使用Pig filename.pig运行执行它。

通常,咕unt声模式在本地模式下运行。

谢谢,

问候,
Dheeraj Rampally。

关于hadoop - 如何转储数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24982689/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com