gpt4 book ai didi

Java Hadoop MapReduce 多个键值

转载 作者:可可西里 更新时间:2023-11-01 15:12:27 25 4
gpt4 key购买 nike

我正在用 Java 实现一个电影推荐系统,并且一直在关注这个网站 Link Here

输入:userId movieRatingCount,ratingSum,(movieId,movieRating)

17    1,3,(70,3)
35 1,1,(21,1)
49 3,7,(19,2 21,1 70,4)
87 2,3,(19,1 21,2)
98 1,2,(19,2)

代码:

def pairwise_items(self, user_id, values):
item_count, item_sum, ratings = values
#print item_count, item_sum, [r for r in combinations(ratings, 2)]
#bottleneck at combinations
for item1, item2 in combinations(ratings, 2):
yield (item1[0], item2[0]), \
(item1[1], item2[1])

输出:firstMovieId, secondMovieId firstRating,secondRating

19,21  2,1
19,70 2,4
21,70 1,4
19,21 1,2

例如,对于userId 49,他看了3部电影。输出将是

firstMovie, secondMovie firstMovieRatings, secondMovieRatings
firstMovie, thirdMovie firstMovieRatings, thirdMovieRatings
secondMovie, thirdMovie secondMovieRatings, thirdMovieRatings

对于观看了 1 部电影的用户,将跳过该输出。

是否可以将此 python 代码转换为 Java?我不知道 map 输出键和值 是什么。以及解决这个问题的方法。提前致谢!

最佳答案

映射器逻辑:

  1. 假设输入具有制表符分隔的键/值。例如“49 3,7,(19,2 21,1 70,4)”
  2. 在value中搜索“(”,解析出“(”和“)”之间的字符串
  3. 它发出 (key,value) 作为 (UserId, (movieId,movieRating))。例如对于记录“49 3,7,(19,2 21,1 70,4)”,它发出键:49,值:19,2 21,1 70,4

Reducer 逻辑:

  1. 它将值拆分为空白 ("")。例如它将“19,2 21,1 70,4”拆分为 3 个字符串:“19,2”、“21,1”和“70,4”。这些值被添加到 ArrayList

  2. 计算这些值的所有 2 路组合。

  3. 最后将这些组合发送到输出。

代码如下:

package com.myorg.hadooptests;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class MovieGroupings {

public static class MovieGroupingsMapper
extends Mapper<LongWritable, Text , Text, Text>{

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

String valueStr = value.toString().trim();
String[] tokens = valueStr.split("\t"); // Assume key/values to be tab seperated. For e.g. "17 1,3,(70,3)"

if(tokens.length == 2) {
int index = tokens[1].indexOf('('); // Search for "(" character
if(index != -1)
{
context.write(new Text(tokens[0]), new Text(tokens[1].substring(index+1, tokens[1].length() - 1))); // Exclude '(' and ')'
}
}
}
}

public static class MovieGroupingsReducer
extends Reducer<Text, Text, Text, Text> {

public void reduce(Text key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {

for (Text value : values) {
String[] tokens = value.toString().split(" "); // Split the values based on blank character

if(tokens.length >= 2) // Ignore if there is only one movie
{
for(int i = 0; i < tokens.length; i++)
for(int j = i + 1; j < tokens.length; j++) {
String groupings = tokens[i] + "," + tokens[j]; // Add 2 movies with ",". For e.g. "19,2,21,1"
String[] moviesAndRatings = groupings.split(",");
if (moviesAndRatings.length == 4)
context.write(new Text(moviesAndRatings[0] + "," + moviesAndRatings[2]),
new Text(moviesAndRatings[1] + "," + moviesAndRatings[3]));
}
}
}
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "MovieGroupings");
job.setJarByClass(MovieGroupings.class);
job.setMapperClass(MovieGroupingsMapper.class);
job.setReducerClass(MovieGroupingsReducer.class);
job.setNumReduceTasks(5);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

FileInputFormat.addInputPath(job, new Path("/in/in5.txt"));
FileOutputFormat.setOutputPath(job, new Path("/out/"));

System.exit(job.waitForCompletion(true) ? 0:1);
}
}

对于以下输入:

17      1,3,(70,3)
35 1,1,(21,1)
49 3,7,(19,2 21,1 70,4)
87 2,3,(19,1 21,2)
98 1,2,(19,2)

生成的输出是:

19,21   2,1
19,70 2,4
21,70 1,4
19,21 1,2

关于Java Hadoop MapReduce 多个键值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34296775/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com