gpt4 book ai didi

java - 将 Hadoop 输出存储到本地 mongodb

转载 作者:可可西里 更新时间:2023-11-01 16:35:19 26 4
gpt4 key购买 nike

我看到过有关将 hadoop HDFS 输出存储到 MongoDB 的问题,但我还没有看到如何将本地 hadoop 输出存储到本地数据库。我知道我需要从这里使用 mongodb 连接器:https://github.com/mongodb/mongo-hadoop

我已将依赖项添加到我的 POM.xml 文件中:

<dependency>
<groupId>org.mongodb.mongo-hadoop</groupId>
<artifactId>mongo-hadoop-core</artifactId>
<version>1.5.1</version>
</dependency>

这是我运行作业的类:

    public static void main(String[] args)
throws IOException, ClassNotFoundException, InterruptedException, InvalidDataException {

Job job = new Job();

job.setJarByClass(hadoop.TwitterJob.class);
job.setJobName("Inverted Index for Twitter Data");

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path("src/output/")); // change this to output to mongodb

job.setMapperClass(InvertedIndexMapper.class);
job.setReducerClass(hadoop.InvertedIndexReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

job.waitForCompletion(true);
}
}

有谁知道如何在本地直接输出到mongodb?谢谢

最佳答案

来自 https://github.com/mongodb/mongo-hadoop/blob/master/README.md :

Write data out in .bson format, which can then be imported to any MongoDB database with mongorestore

来自 https://github.com/mongodb/mongo-hadoop/wiki/Using-.bson-Files :

To write the output of a job to .bson files, set mongo.job.output.format to com.mongodb.hadoop.BSONFileOutputFormat or use MongoConfigUtil.setOutputFormat(com.mongodb.hadoop.BSONFileOutputFormat.class)

这看起来很简单,你可以测试一下

https://github.com/mongodb/mongo-hadoop/blob/master/examples/sensors/src/main/java/com/mongodb/hadoop/examples/sensors/Devices.java

关于java - 将 Hadoop 输出存储到本地 mongodb,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55093123/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com