gpt4 book ai didi

hadoop - Map中的SQL建模精简

转载 作者:行者123 更新时间:2023-12-02 21:05:36 25 4
gpt4 key购买 nike

我正在尝试对之类的SQL查询建模,从表中col2 = value2 的表缩减中选择不同的(col1)表。我使用的逻辑是,每个映射器都会检查where子句,如果找到匹配项,它将发出where子句值作为键,而col1作为值。基于默认的哈希函数,所有输出将与来自where子句的键使用的值一起进入相同的reducer。在reducer中,我可以排除重复项并发出不同的值。这是正确的方法吗?

这是实现此目标的正确方法吗?

注意:此查询的数据在CSV文件中。

最佳答案

//MAPPER pseudo code
public static class DistinctMapper extends Mapper<Object, Text, Text, NullWritable> {
private Text col1 = new Text();
private Text col2 = new Text();

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

// Logic to extract columns
String C1 = extractColumn(value);
String C2 = extractColumn(value);


if (C2 != 'WhereCluaseValue') { // filter value
return;
}
// Mapper output key to the distinct column value
col1.set(C1);
// Mapper value as NULL
context.write(col1, NullWritable.get());
}
}

//REDUCER pseudo code
public static class DistinctReducer extends Reducer<Text, NullWritable, Text, NullWritable> {
public void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
// distinct column with a null value
//Here we are not concerned about the list of values
context.write(key, NullWritable.get());
}
}

关于hadoop - Map中的SQL建模精简,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42081906/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com