hadoop - 在Hive MapReduce中访问数据-6ren

hadoop - 在Hive MapReduce中访问数据

转载作者：行者123 更新时间：2023-12-02 21:42:14

我尝试从配置单元表中加载数据并将数据放入另一个表中。
从表中加载数据

CREATE  TABLE `dmg_bindings`(
  `viuserid` string, 
  `puid` string, 
  `ts` bigint)
PARTITIONED BY ( 
  `dt` string, 
  `pid` string)

并将数据放入

CREATE  TABLE `newdmgbnd`(
  `ts` int, 
  `puid1` string, 
  `puid2` string)
PARTITIONED BY ( 
  `dt` string, 
  `platid1` string, 
  `platid2` string)

但是我有一个问题，找不到我错了。
我有下一个错误:

15/01/15 10:22:07 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
    15/01/15 10:22:07 INFO hive.metastore: Trying to connect to metastore with URI thrift://srv112.test.local:9083
    15/01/15 10:22:07 INFO hive.metastore: Connected to metastore.
    15/01/15 10:22:08 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@6d88b065] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@6e205d5c] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@5b031819] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@223e0fa1] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@1d73aa82] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@1b10b8a3] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@506422f2] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@3f0eca9f] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@da24f04] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@6ad66647] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@2469fb45] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@2b2b5f52] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@4ba6fc80] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@2a5c3214] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@666e18bb] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@6a974e] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@2c09f7be] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@362239c7] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@7ac85bb5] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@4d9e25f] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@1a74fc3d] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@17c02eb9] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@847ac3e] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@656a0389] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@f775a5b] nullstring=\N
    15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@53ef7ba0] nullstring=\N
    15/01/15 10:22:08 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
    15/01/15 10:22:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    15/01/15 10:22:10 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
    15/01/15 10:22:10 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 2
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 1
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 1
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 1
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 2
    15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 1
    15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 16
    15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 1
    15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 1
    15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 40
    15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 1
    15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 1
    15/01/15 10:22:12 INFO mapred.JobClient: Running job: job_201412021320_0142
    15/01/15 10:22:13 INFO mapred.JobClient:  map 0% reduce 0%
    15/01/15 10:22:24 INFO mapred.JobClient: Task Id : attempt_201412021320_0142_m_000002_0, Status : FAILED
    java.lang.NullPointerException
        at org.apache.hive.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:167)
        at org.apache.hive.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
        at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558)
        at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106)
        at MapNewDmg.map(MapNewDmg.java:32)
        at MapNewDmg.map(MapNewDmg.java:15)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(Use
    attempt_201412021320_0142_m_000002_0: SLF4J: Class path contains multiple SLF4J bindings.
    attempt_201412021320_0142_m_000002_0: SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    attempt_201412021320_0142_m_000002_0: SLF4J: Found binding in [jar:file:/mnt1/mapred/local/taskTracker/mvolosnikova/jobcache/job_201412021320_0142/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    attempt_201412021320_0142_m_000002_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    attempt_201412021320_0142_m_000002_0: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

我的代码Driver.class。

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hive.hcatalog.data.DefaultHCatRecord;
import org.apache.hive.hcatalog.data.schema.HCatFieldSchema;
import org.apache.hive.hcatalog.data.schema.HCatSchema;
import org.apache.hive.hcatalog.mapreduce.HCatInputFormat;
import org.apache.hive.hcatalog.mapreduce.HCatOutputFormat;
import org.apache.hive.hcatalog.mapreduce.InputJobInfo;
import org.apache.hive.hcatalog.mapreduce.OutputJobInfo;
import java.io.FileInputStream;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.*;

public class Driver extends Configured implements Tool{
    @Override
    public int run(String[] strings) throws Exception {
        Configuration conf = getConf();
        Job job = Job.getInstance(conf, "newDmg");
        HCatInputFormat.setInput(job, "default", "dmg_bindings", "dt=\"2014-09-01\"");
        job.setJarByClass(Driver.class);
        job.setMapperClass(MapNewDmg.class);
        job.setNumReduceTasks(0);
        job.setInputFormatClass(HCatInputFormat.class);
        job.setOutputKeyClass(WritableComparable.class);
        job.setOutputValueClass(DefaultHCatRecord.class);
        job.setOutputFormatClass(HCatOutputFormat.class);
        Map staticPartitions = new HashMap<String, String>(1);
        staticPartitions.put("dt", "2014-09-01");
        List dynamicPartitions = new ArrayList<String>();
        dynamicPartitions.add("platid1");
        dynamicPartitions.add("platid2");
        OutputJobInfo jobInfo = OutputJobInfo.create("default", "newdmgbnd", staticPartitions);
        jobInfo.setDynamicPartitioningKeys(dynamicPartitions);
        HCatOutputFormat.setOutput(job, jobInfo);
        HCatSchema schema = HCatOutputFormat.getTableSchema(job);
        schema.append(new HCatFieldSchema("platid1", HCatFieldSchema.Type.STRING, ""));
        schema.append(new HCatFieldSchema("platid2", HCatFieldSchema.Type.STRING, ""));
        HCatOutputFormat.setSchema(job, schema);
        return job.waitForCompletion(true) ? 0 : 1;
    }
    public static void main(String[] args) throws Exception {
        int exitcode = ToolRunner.run(new Driver(), args);
        System.exit(exitcode);
    }
}

我的代码Mapper.class。

import org.apache.hadoop.io.WritableComparable;
import org.apache.hive.hcatalog.data.DefaultHCatRecord;
import org.apache.hive.hcatalog.data.HCatRecord;
import org.apache.hive.hcatalog.data.schema.HCatFieldSchema;
import org.apache.hive.hcatalog.data.schema.HCatSchema;
import org.apache.hive.hcatalog.mapreduce.HCatInputFormat;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Mapper;

public class MapNewDmg extends Mapper<WritableComparable, HCatRecord, WritableComparable, HCatRecord> {
    @Override
    protected void map(WritableComparable key, HCatRecord value, Context context) throws IOException, InterruptedException {
        String viuserid = (String) value.get(0);
        String puid = (String) value.get(1);
        Long ts = (Long) value.get(2);
        String pid = (String) value.get(4);
        int newts = (int) (ts / 1000);
        HCatRecord record = new DefaultHCatRecord(6);
        record.set(0, newts);
        record.set(1, viuserid);
        record.set(2, puid);
        record.set(4, "586");
        record.set(5, pid);
        context.write(null, record);
    }
}

我在程序中做错了什么？
我无法理解为什么会出现此错误，因为我的数据不为空!!! (是的，我已 checkout )
请帮我。谢谢。

最佳答案

在映射器中，您正在调用context.write(null, record);，这是错误的。如果您不想指定键，请使用NullWritable(将映射器的声明，驱动程序更改为反射(reflect)使用的新类型，并将context.write(null, record);更改为context.write(NullWritable.get(), record);
但这不是最好的解决方案，如果您涉及到 reducer (不是您的情况，而是FYI)，请参阅此处以获取详细信息:https://support.pivotal.io/hc/en-us/articles/202810986-Mapper-output-key-value-NullWritable-can-cause-reducer-phase-to-move-slowly

关于hadoop - 在Hive MapReduce中访问数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27958160/

文章推荐： hadoop - 如何在hadoop配置中选择目录列表？

文章推荐： hadoop - Hadoop.1.0.3不再工作

javascript - php 访问 mqsql 或 html 访问 json 或 html 访问 xml ？哪个更快？
关闭。这个问题是opinion-based 。目前不接受答案。想要改进这个问题吗？更新问题，以便 editing this post 可以用事实和引文来回答它。 . 已关闭 4 年前。 Improv
powershell - API 访问 PowerShell Web 访问？
PowerShell Web Access 允许您通过 Web 浏览器运行 PowerShell cmdlet。它显示了一个基于 Web 的控制台窗口。有没有办法运行 cmdlet 而无需在控制台窗
c# - 如何使用应用程序级身份验证/访问 token 访问 Sharepoint 文件？
我尝试在无需用户登录的情况下访问 Sharepoint 文件。我可以通过以下任一方式获取访问 token 方法一: var client = new RestClient("https://logi
soap - 使用 OAuth 访问 token 访问 SOAP 服务？
我目前正在尝试通过 Chrome 扩展程序访问 Google 服务。我的理解是，对于 JS 应用程序，Google 首选的身份验证机制是 OAuth。我的应用目前已成功通过 OAuth 向服务进行身份
C++ - 允许通过基类(接口(interface))访问，禁止通过派生类(具体实现)访问？
假设我有纯抽象类 IHandler 和派生自它的类: class IHandler { public: virtual int process_input(char input) = 0; };
css - 可以通过 URL 访问 CSS 文件，但不能从 HTML 访问
我有一个带有 ThymeLeaf 和 Dojo 的 Spring 应用程序，这给我带来了问题。当我从我的 HTML 文件中引用 CSS 文件时，它们在 Firebug 中显示为中止。但是，当我通过在地
javascript - 为什么我可以用 [val] 访问 js 对象，但不能用 .val 访问？
这个问题已经有答案了: JavaScript property access: dot notation vs. brackets? (17 个回答) 已关闭 6 年前。为什么这不起作用？ func
.htaccess - 仅允许通过 http 访问 robot.txt，其他通过 https 访问
我想将所有流量重定向到 https，只有 robot.txt 应该可以通过 http 访问。是否可以为 robot.txt 文件创建异常(exception)？我的 .htaccess 文件: R
oauth-2.0 - 无法使用有效的 oauth2 访问 token 访问 Linkedin 个人资料
我遇到了 LinkedIn OAuth2: "Unable to verify access token" 中描述的相同问题;但是，那里描述的解决方案并不能解决我的问题。我能够成功请求访问 toke
Docker 容器不能通过 localhost 访问，但可以通过 127.0.0.1 访问
问题我有一个暴露给 *:8080 的 Docker 服务容器. 我无法通过 localhost:8080 访问容器. Chrome /curl无限期挂断。但是如果我使用任何其他本地IP，我就可以访
python - 使用 OAuth 2.0 访问 token 访问 Gmail Imap
我正在使用 Google 的 Oauth 2.0 来获取用户的 access_token，但我不知道如何将它与 imaplib 一起使用来访问收件箱。最佳答案下面是带有 oauth 2.0 的 I
curl - 可以从 curl 访问 docker 服务，但不能从 postman/chrome 访问
我正在做 docker 入门指南:https://docs.docker.com/get-started/part3/#recap-and-cheat-sheet-optional docker-co
azure - 带有 Nginx 的 AKS 无法通过 IP 访问，只能通过 DNS 访问
我正在尝试使用静态 IP 在 AKS 上创建一个 Web 应用程序，自然找到了一个带有 Nginx ingress controller in Azure's documentation 的解决方案。
javascript - 为什么可以将 'module.exports' 作为 'exports' 访问，但不能使用 'module.id' 访问？
这是我在名为 foo.js 的文件中的代码。 console.log('module.exports:', module.exports) console.log('module.id:', modu
amazon-web-services - aws 访问 key ID 和 secret 访问 key
我试图理解访问键。我读过https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-se
xcode - 从 iOS 5 访问 Twitter 时 OAuth 访问 token 失败
我正在使用 MGTwitterEngine"将 twitter 集成到我的应用程序中。它在 iOS 4.2 上运行良好。当我尝试从任何 iOS 5 设备访问 twitter 时，我遇到了身份验证 to
amazon-web-services - aws 访问 key ID 和 secret 访问 key
我试图理解访问键。我读过https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-se
ios - 如果 Facebook 访问 token 过期，会生成新的 Facebook 访问 token 吗？
我正在使用以下 API 列出我的 Facebook 好友。 https://graph.facebook.com/me/friends?access_token= ??? 我想知道访问 token 过
google-app-engine - 尝试使用 API key 访问 BigQuery 时出错(简单 API 访问)
401 Unauthorized - Show headers - { "error": { "errors": [ { "domain": "global", "reas
django - 从 heroku 访问 s3 内容时，AWS 访问 key 显示在浏览器 url 中
我已经将我的 django 应用程序部署到 heroku 并使用 Amazon s3 存储桶存储静态文件，我发现从 s3 存储桶到 heroku 获取数据没有问题。但是，当我测试查看内容存储位置时，除

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

hadoop - 在Hive MapReduce中访问数据