gpt4 book ai didi

hadoop - 在Hive MapReduce中访问数据

转载 作者:行者123 更新时间:2023-12-02 21:42:14 25 4
gpt4 key购买 nike

我尝试从配置单元表中加载数据并将数据放入另一个表中。
从表中加载数据

CREATE  TABLE `dmg_bindings`(
`viuserid` string,
`puid` string,
`ts` bigint)
PARTITIONED BY (
`dt` string,
`pid` string)

并将数据放入
CREATE  TABLE `newdmgbnd`(
`ts` int,
`puid1` string,
`puid2` string)
PARTITIONED BY (
`dt` string,
`platid1` string,
`platid2` string)

但是我有一个问题,找不到我错了。
我有下一个错误:
15/01/15 10:22:07 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
15/01/15 10:22:07 INFO hive.metastore: Trying to connect to metastore with URI thrift://srv112.test.local:9083
15/01/15 10:22:07 INFO hive.metastore: Connected to metastore.
15/01/15 10:22:08 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@6d88b065] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@6e205d5c] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@5b031819] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@223e0fa1] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@1d73aa82] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@1b10b8a3] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@506422f2] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@3f0eca9f] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@da24f04] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@6ad66647] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@2469fb45] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@2b2b5f52] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@4ba6fc80] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@2a5c3214] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@666e18bb] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@6a974e] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@2c09f7be] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@362239c7] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@7ac85bb5] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@4d9e25f] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@1a74fc3d] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@17c02eb9] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@847ac3e] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@656a0389] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@f775a5b] nullstring=\N
15/01/15 10:22:08 INFO columnar.ColumnarSerDe: ColumnarSerDe initialized with: columnNames=[viuserid, puid, ts] columnTypes=[string, string, bigint] separator=[[B@53ef7ba0] nullstring=\N
15/01/15 10:22:08 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
15/01/15 10:22:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/01/15 10:22:10 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
15/01/15 10:22:10 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 2
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 1
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 1
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 1
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 2
15/01/15 10:22:10 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 1
15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 16
15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 1
15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 1
15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 40
15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 1
15/01/15 10:22:11 INFO mapred.FileInputFormat: Total input paths to process : 1
15/01/15 10:22:12 INFO mapred.JobClient: Running job: job_201412021320_0142
15/01/15 10:22:13 INFO mapred.JobClient: map 0% reduce 0%
15/01/15 10:22:24 INFO mapred.JobClient: Task Id : attempt_201412021320_0142_m_000002_0, Status : FAILED
java.lang.NullPointerException
at org.apache.hive.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:167)
at org.apache.hive.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106)
at MapNewDmg.map(MapNewDmg.java:32)
at MapNewDmg.map(MapNewDmg.java:15)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(Use
attempt_201412021320_0142_m_000002_0: SLF4J: Class path contains multiple SLF4J bindings.
attempt_201412021320_0142_m_000002_0: SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201412021320_0142_m_000002_0: SLF4J: Found binding in [jar:file:/mnt1/mapred/local/taskTracker/mvolosnikova/jobcache/job_201412021320_0142/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201412021320_0142_m_000002_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
attempt_201412021320_0142_m_000002_0: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

我的代码Driver.class。
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hive.hcatalog.data.DefaultHCatRecord;
import org.apache.hive.hcatalog.data.schema.HCatFieldSchema;
import org.apache.hive.hcatalog.data.schema.HCatSchema;
import org.apache.hive.hcatalog.mapreduce.HCatInputFormat;
import org.apache.hive.hcatalog.mapreduce.HCatOutputFormat;
import org.apache.hive.hcatalog.mapreduce.InputJobInfo;
import org.apache.hive.hcatalog.mapreduce.OutputJobInfo;
import java.io.FileInputStream;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.*;

public class Driver extends Configured implements Tool{
@Override
public int run(String[] strings) throws Exception {
Configuration conf = getConf();
Job job = Job.getInstance(conf, "newDmg");
HCatInputFormat.setInput(job, "default", "dmg_bindings", "dt=\"2014-09-01\"");
job.setJarByClass(Driver.class);
job.setMapperClass(MapNewDmg.class);
job.setNumReduceTasks(0);
job.setInputFormatClass(HCatInputFormat.class);
job.setOutputKeyClass(WritableComparable.class);
job.setOutputValueClass(DefaultHCatRecord.class);
job.setOutputFormatClass(HCatOutputFormat.class);
Map staticPartitions = new HashMap<String, String>(1);
staticPartitions.put("dt", "2014-09-01");
List dynamicPartitions = new ArrayList<String>();
dynamicPartitions.add("platid1");
dynamicPartitions.add("platid2");
OutputJobInfo jobInfo = OutputJobInfo.create("default", "newdmgbnd", staticPartitions);
jobInfo.setDynamicPartitioningKeys(dynamicPartitions);
HCatOutputFormat.setOutput(job, jobInfo);
HCatSchema schema = HCatOutputFormat.getTableSchema(job);
schema.append(new HCatFieldSchema("platid1", HCatFieldSchema.Type.STRING, ""));
schema.append(new HCatFieldSchema("platid2", HCatFieldSchema.Type.STRING, ""));
HCatOutputFormat.setSchema(job, schema);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitcode = ToolRunner.run(new Driver(), args);
System.exit(exitcode);
}
}

我的代码Mapper.class。
import org.apache.hadoop.io.WritableComparable;
import org.apache.hive.hcatalog.data.DefaultHCatRecord;
import org.apache.hive.hcatalog.data.HCatRecord;
import org.apache.hive.hcatalog.data.schema.HCatFieldSchema;
import org.apache.hive.hcatalog.data.schema.HCatSchema;
import org.apache.hive.hcatalog.mapreduce.HCatInputFormat;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Mapper;

public class MapNewDmg extends Mapper<WritableComparable, HCatRecord, WritableComparable, HCatRecord> {
@Override
protected void map(WritableComparable key, HCatRecord value, Context context) throws IOException, InterruptedException {
String viuserid = (String) value.get(0);
String puid = (String) value.get(1);
Long ts = (Long) value.get(2);
String pid = (String) value.get(4);
int newts = (int) (ts / 1000);
HCatRecord record = new DefaultHCatRecord(6);
record.set(0, newts);
record.set(1, viuserid);
record.set(2, puid);
record.set(4, "586");
record.set(5, pid);
context.write(null, record);
}
}

我在程序中做错了什么?
我无法理解为什么会出现此错误,因为我的数据不为空!!! (是的,我已 checkout )
请帮我。谢谢。

最佳答案

在映射器中,您正在调用context.write(null, record);,这是错误的。如果您不想指定键,请使用NullWritable(将映射器的声明,驱动程序更改为反射(reflect)使用的新类型,并将context.write(null, record);更改为context.write(NullWritable.get(), record);
但这不是最好的解决方案,如果您涉及到 reducer (不是您的情况,而是FYI),请参阅此处以获取详细信息:https://support.pivotal.io/hc/en-us/articles/202810986-Mapper-output-key-value-NullWritable-can-cause-reducer-phase-to-move-slowly

关于hadoop - 在Hive MapReduce中访问数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27958160/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com