gpt4 book ai didi

Hadoop 减少当前值与先前值的连接

转载 作者:可可西里 更新时间:2023-11-01 16:27:18 28 4
gpt4 key购买 nike

我有这个减少功能:

protected void reduce(Text key, Iterable<SortedMapWritable> values, Context context) throws IOException, InterruptedException {
StringBuilder strOutput = new StringBuilder();
double sum = 0, i = 0;
DoubleWritable val = null;

SortedMapWritable tmp = values.iterator().next();
strOutput.append("[");
Set<WritableComparable> keys = tmp.keySet();
for (WritableComparable mapKey : keys) {
val = (DoubleWritable)tmp.get(mapKey);
sum += val.get();
if(i > 0)
strOutput.append(",");
strOutput.append(val.get());
i++;
}
strOutput.append("]");

context.write(new Text(key.toString()), new Text(strOutput.toString()));
context.write(new Text(key.toString() + "Med"), new Text(Double.toString(sum/i)));
}

作为 SortedMapWritable,我使用了 <LongWritable,DoubleWritable> ,正如我们在这段代码中看到的

    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
final Context ctx = context;
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
Path srcPath = new Path(hdfs.getWorkingDirectory() + "/" + value);
Path dstPath = new Path("/tmp/");

hdfs.copyToLocalFile(srcPath, dstPath);

final StringBuilder errbuf = new StringBuilder();
final Pcap pcap = Pcap.openOffline(dstPath.toString() + "/" +value, errbuf);
if (pcap == null) {
throw new InterruptedException("Impossible create PCAP file");
}

final HashMap<Integer,JxtaSocketFlow> dataFlows = new HashMap<Integer,JxtaSocketFlow>();
final HashMap<Integer,JxtaSocketFlow> ackFlows = new HashMap<Integer,JxtaSocketFlow>();

generateHalfSocketFlows(errbuf, pcap, dataFlows, ackFlows);
final Text jxtaPayloadKey = new Text("JXTA_Payload");
final Text jxtaRelyRtt = new Text("JXTA_Reliability_RTT");

SortedMapWritable payOutput = new SortedMapWritable();
SortedMapWritable rttOutput = new SortedMapWritable();

for (Integer dataFlowKey : dataFlows.keySet()) {
JxtaSocketFlow dataFlow = dataFlows.get(dataFlowKey);
JxtaSocketStatistics stats = dataFlow.getJxtaSocketStatistics();

payOutput.put(new LongWritable(stats.getEndTime()), new DoubleWritable((stats.getPayload())/1024));
HashMap<Integer,Long> rtts = stats.getRtts();
for (Integer num : rtts.keySet()) {
LongWritable key = new LongWritable(stats.getEndTime() + num);
rttOutput.put(key, new DoubleWritable(rtts.get(num)));
}
}

try{
ctx.write(jxtaPayloadKey, payOutput);
ctx.write(jxtaRelyRtt, rttOutput);
}catch(IOException e){
e.printStackTrace();
}catch(InterruptedException e){
e.printStackTrace();
}
}

在 reduce 函数中,对于每个键,值已与先前的值连接。

例如,在正确的方式中,键和值应该是:

key1 -> {a, b, c} key2 -> {d, e, f}

但是值已经被

key1 -> {a, b, c} key2 -> {a, b, c, d, e, f}

有谁知道为什么会发生这种情况,我该如何避免这种情况?

最佳答案

hadoop 有一个 Unresolved 错误 https://issues.apache.org/jira/browse/HADOOP-5454这可能可以解释您遇到的问题。

在下面的代码中,需要 row.clear() 来防止值从一个迭代附加到下一个迭代。

@Log4jpublic class StackOverFlowReducer extends Reducer{    public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException    {        for (SortedMapWritable row : values)        {            log.info(String.format("New Map : %s", Joiner.on(",").join(row.entrySet())));            row.clear();//https://issues.apache.org/jira/browse/HADOOP-5454        }    }}

我只在一个键内测试了解决方法。希望对您有所帮助。

关于Hadoop 减少当前值与先前值的连接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9916549/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com