gpt4 book ai didi

hdfs - 将文本文件级联到 Parquet

转载 作者:行者123 更新时间:2023-12-04 18:06:45 28 4
gpt4 key购买 nike

我正在尝试使用级联将文件转换为 Parquet。但是我收到以下错误。

错误

Exception in thread "main" cascading.flow.planner.PlannerException: tap named: 'Copy', cannot be used as a sink: Hfs["ParquetTupleScheme[['A', 'B']->[ALL]]"]["/user/cloudera/htcountp"]
at cascading.flow.planner.FlowPlanner.verifyTaps(FlowPlanner.java:240)
at cascading.flow.planner.FlowPlanner.verifyAllTaps(FlowPlanner.java:174)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:242)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:80)
at cascading.flow.FlowConnector.connect(FlowConnector.java:459)
at first.Copy.main(Copy.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

代码

Scheme sourceScheme = new TextDelimited(new Fields("A","B"), ", ");
Scheme sinkScheme = new ParquetTupleScheme(new Fields("A", "B"));

// create the source tap
Tap inTap = new Hfs(sourceScheme, inPath );

// create the sink tap
Tap outTap = new Hfs( sinkScheme, outPath );

// specify a pipe to connect the taps
Pipe copyPipe = new Pipe("Copy");

// connect the taps, pipes, etc., into a flow
FlowDef flowDef = FlowDef.flowDef()
.addSource( copyPipe, inTap )
.addTailSink( copyPipe, outTap );

// run the flow
flowConnector.connect( flowDef ).complete();

最佳答案

遇到了同样的问题。查看源代码,您必须将 Parquet 模式传递给 ParquetTupleScheme 的构造函数,以便它可以将数据序列化到 HDFS。该类有一个方法 isSink(),它会检查以确保它在那里。否则,它不是接收器,代码会抛出您发现的错误。

关于hdfs - 将文本文件级联到 Parquet,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24914693/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com