Hadoop 减少当前值与先前值的连接-6ren

Hadoop 减少当前值与先前值的连接

转载作者：可可西里更新时间：2023-11-01 16:27:18

我有这个减少功能:

protected void reduce(Text key, Iterable<SortedMapWritable> values, Context context) throws IOException, InterruptedException {
    StringBuilder strOutput = new StringBuilder();
    double sum = 0, i = 0;
    DoubleWritable val = null;

    SortedMapWritable tmp = values.iterator().next();
    strOutput.append("[");
    Set<WritableComparable> keys = tmp.keySet();
    for (WritableComparable mapKey : keys) {                    
        val = (DoubleWritable)tmp.get(mapKey);
        sum += val.get();
        if(i > 0)
            strOutput.append(",");
        strOutput.append(val.get());
        i++;
    }
    strOutput.append("]");

    context.write(new Text(key.toString()), new Text(strOutput.toString()));
    context.write(new Text(key.toString() + "Med"), new Text(Double.toString(sum/i)));
}

作为 SortedMapWritable，我使用了 <LongWritable,DoubleWritable> ，正如我们在这段代码中看到的

    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    final Context ctx = context;
    Configuration conf = new Configuration();
    FileSystem hdfs = FileSystem.get(conf); 
    Path srcPath = new Path(hdfs.getWorkingDirectory() + "/" + value);  
    Path dstPath = new Path("/tmp/");       

    hdfs.copyToLocalFile(srcPath, dstPath);

    final StringBuilder errbuf = new StringBuilder();
    final Pcap pcap = Pcap.openOffline(dstPath.toString() + "/" +value, errbuf);
    if (pcap == null) {
        throw new InterruptedException("Impossible create PCAP file");
    }

    final HashMap<Integer,JxtaSocketFlow> dataFlows = new HashMap<Integer,JxtaSocketFlow>();
    final HashMap<Integer,JxtaSocketFlow> ackFlows = new HashMap<Integer,JxtaSocketFlow>();

    generateHalfSocketFlows(errbuf, pcap, dataFlows, ackFlows);
    final Text jxtaPayloadKey = new Text("JXTA_Payload");
    final Text jxtaRelyRtt = new Text("JXTA_Reliability_RTT");

    SortedMapWritable payOutput = new SortedMapWritable();
    SortedMapWritable rttOutput = new SortedMapWritable();

    for (Integer dataFlowKey : dataFlows.keySet()) {
        JxtaSocketFlow dataFlow = dataFlows.get(dataFlowKey);
        JxtaSocketStatistics stats = dataFlow.getJxtaSocketStatistics();

        payOutput.put(new LongWritable(stats.getEndTime()), new DoubleWritable((stats.getPayload())/1024));         
        HashMap<Integer,Long> rtts = stats.getRtts();
        for (Integer num : rtts.keySet()) {
            LongWritable key = new LongWritable(stats.getEndTime() + num);                                                      
            rttOutput.put(key, new DoubleWritable(rtts.get(num)));
        }
    }

    try{
        ctx.write(jxtaPayloadKey, payOutput);
        ctx.write(jxtaRelyRtt, rttOutput);
    }catch(IOException e){
        e.printStackTrace();
    }catch(InterruptedException e){
        e.printStackTrace();
    }
}

在 reduce 函数中，对于每个键，值已与先前的值连接。

例如，在正确的方式中，键和值应该是:

key1 -> {a, b, c} key2 -> {d, e, f}

但是值已经被

key1 -> {a, b, c} key2 -> {a, b, c, d, e, f}

有谁知道为什么会发生这种情况，我该如何避免这种情况？

最佳答案

hadoop 有一个 Unresolved 错误 https://issues.apache.org/jira/browse/HADOOP-5454这可能可以解释您遇到的问题。

在下面的代码中，需要 row.clear() 来防止值从一个迭代附加到下一个迭代。

@Log4jpublic class StackOverFlowReducer extends Reducer{    public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException    {        for (SortedMapWritable row : values)        {            log.info(String.format("New Map : %s", Joiner.on(",").join(row.entrySet())));            row.clear();//https://issues.apache.org/jira/browse/HADOOP-5454        }    }}

我只在一个键内测试了解决方法。希望对您有所帮助。

关于Hadoop 减少当前值与先前值的连接，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/9916549/

文章推荐： web-services - 与亚马逊网络服务集成

文章推荐： c# - 快速新手问题 : where to start creating an HTTP request?

swift - 先前 SceneKit 内容的闪烁
我的应用将 SceneKit 内容的“页面”与图像和文本交替。当我从图像页面前进到新的 SceneKit 页面时，前一个 SceneKit 页面中的内容会短暂显示，然后被新内容替换。时髦。我只使用一
c# - 按后续负数的数量对数组的(先前)元素进行分组
我正在尝试处理(在 C# 中)包含一些数字数据的大型数据文件。给定一个整数数组，如何对其进行拆分/分组，以便如果下一个 n(两个或更多)是负数，则前一个 n 元素被分组。例如，在下面的数组中，应该使用
javascript - then() 函数是否返回反射(reflect)先前 promise 结果的 promise ？
刚接触promises，研究过。所以我的代码和我的理解: sql.connect(config).then(function(connection) { return connection.req
java - 是否可以根据两个(或更多)先前 'else' 的结果创建一个 'if'？
目前我在 if (roobaf) block 中有一些代码，这取决于 foo 和 bar 是否为假。我可以在 block 内再次检查这些条件，但感觉像是不必要的代码重复。 if (foo) {

可可西里

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Hadoop 减少当前值与先前值的连接