gpt4 book ai didi

java - 在 Hadoop 集群中使用另一个类的静态变量

转载 作者:行者123 更新时间:2023-12-01 13:10:19 25 4
gpt4 key购买 nike

我的 hadoop 程序如下所示。我放置了相关代码片段。我传递了将 main 中的 BiG_DATA 读取为 true 的参数。主要内容是“处理大数据”。但是当涉及到RowPreMap类中的map方法时,BIG_DATA的值是其初始化值false。不知道为什么会发生这种情况。我错过了什么吗?当我在独立机器上运行时,这是有效的,但当我在 hadoop 集群上执行此操作时,则无效。这些作业由 JobControl 处理。是有线程的东西吗?

公共(public)类 UVDriver 扩展了已配置实现工具{

    public static class RowMPreMap extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text> {

private Text keyText = new Text();
private Text valText = new Text();

public void map(LongWritable key, Text value,
OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {

// Input: (lineNo, lineContent)

// Split each line using seperator based on the dataset.
String line[] = null;
if (Settings.BIG_DATA)
line = value.toString().split("::");
else
line = value.toString().split("\\s");

keyText.set(line[0]);
valText.set(line[1] + "," + line[2]);

// Output: (userid, "movieid,rating")
output.collect(keyText, valText);

}
}

public static class Settings {

public static boolean BIG_DATA = false;

public static int noOfUsers = 0;
public static int noOfMovies = 0;

public static final int noOfCommonFeatures = 10;
public static final int noOfIterationsRequired = 3;
public static final float INITIAL_VALUE = 0.1f;

public static final String NORMALIZE_DATA_PATH_TEMP = "normalize_temp";
public static final String NORMALIZE_DATA_PATH = "normalize";
public static String INPUT_PATH = "input";
public static String OUTPUT_PATH = "output";
public static String TEMP_PATH = "temp";

}

public static class Constants {

public static final int BIG_DATA_USERS = 71567;
public static final int BIG_DATA_MOVIES = 10681;
public static final int SMALL_DATA_USERS = 943;
public static final int SMALL_DATA_MOVIES = 1682;

public static final int M_Matrix = 1;
public static final int U_Matrix = 2;
public static final int V_Matrix = 3;
}

public int run(String[] args) throws Exception {

// 1. Pre-process the data.
// a) Normalize
// 2. Initialize the U, V Matrices
// a) Initialize U Matrix
// b) Initialize V Matrix
// 3. Iterate to update U and V.

// Write Job details for each of the above steps.

Settings.INPUT_PATH = args[0];
Settings.OUTPUT_PATH = args[1];
Settings.TEMP_PATH = args[2];
Settings.BIG_DATA = Boolean.parseBoolean(args[3]);

if (Settings.BIG_DATA) {
System.out.println("Working on BIG DATA.");
Settings.noOfUsers = Constants.BIG_DATA_USERS;
Settings.noOfMovies = Constants.BIG_DATA_MOVIES;
} else {
System.out.println("Working on Small DATA.");
Settings.noOfUsers = Constants.SMALL_DATA_USERS;
Settings.noOfMovies = Constants.SMALL_DATA_MOVIES;
}

// some code here

handleRun(control);


return 0;
}

public static void main(String args[]) throws Exception {

System.out.println("Program started");
if (args.length != 4) {
System.err
.println("Usage: UVDriver <input path> <output path> <fs path>");
System.exit(-1);
}

Configuration configuration = new Configuration();
String[] otherArgs = new GenericOptionsParser(configuration, args)
.getRemainingArgs();
ToolRunner.run(new UVDriver(), otherArgs);
System.out.println("Program complete.");
System.exit(0);
}

}

作业控制。

public static class JobRunner implements Runnable {
private JobControl control;

public JobRunner(JobControl _control) {
this.control = _control;
}

public void run() {
this.control.run();
}
}

public static void handleRun(JobControl control)
throws InterruptedException {
JobRunner runner = new JobRunner(control);
Thread t = new Thread(runner);
t.start();

int i = 0;
while (!control.allFinished()) {
if (i % 20 == 0) {
System.out
.println(new Date().toString() + ": Still running...");
System.out.println("Running jobs: "
+ control.getRunningJobs().toString());
System.out.println("Waiting jobs: "
+ control.getWaitingJobs().toString());
System.out.println("Successful jobs: "
+ control.getSuccessfulJobs().toString());
}
Thread.sleep(1000);
i++;
}

if (control.getFailedJobs() != null) {
System.out.println("Failed jobs: "
+ control.getFailedJobs().toString());
}
}

最佳答案

这不起作用,因为 static 修饰符的范围不会跨越 JVM 的多个实例(更不用说网络了。)

映射任务始终在单独的 JVM 中运行,即使它在工具运行程序本地运行也是如此。映射器类仅使用类名进行实例化,并且无法访问您在工具运行程序中设置的信息。

这就是配置框架存在的原因之一。

关于java - 在 Hadoop 集群中使用另一个类的静态变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22924613/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com