gpt4 book ai didi

hadoop - Sqoop导入为Avro错误

转载 作者:行者123 更新时间:2023-12-02 21:21:32 25 4
gpt4 key购买 nike

堆栈:使用Ambari 2.1安装了HDP-2.3.2.0-2950

我正在尝试将SQL Server表导入HDFS。

[sqoop@l1038lab root]$ sqoop import --connect 'jdbc:sqlserver://dbserver;database=dbname' --username someusername --password somepassword --as-avrodatafile  --table DimSampleDesc --warehouse-dir /dataload/tohdfs/reio/odpdw/may2016 --verbose

输出中有一个错误:
Writing Avro schema file: /tmp/sqoop-sqoop/compile/bbbd98974f09b50a9335cedde30f73a5/DimSampleDesc.avsc
16/05/09 13:09:00 DEBUG mapreduce.DataDrivenImportJob: Could not move Avro schema file to code output directory.
java.io.FileNotFoundException: Destination directory '.' does not exist [createDestDir=true]
at org.apache.commons.io.FileUtils.moveFileToDirectory(FileUtils.java:2865)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.writeAvroSchema(DataDrivenImportJob.java:146)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:92)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:163)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)

/ tmp / sqoop-sqoop / compile / bbbd98974f09b50a9335cedde30f73a5 /的内容:
[sqoop@l1038lab root]$ ls -lrt /tmp/sqoop-sqoop/compile/bbbd98974f09b50a9335cedde30f73a5/
total 104
-rw-r--r--. 1 sqoop hadoop 61005 May 9 13:08 DimSampleDesc.java
-rw-r--r--. 1 sqoop hadoop 28540 May 9 13:08 DimSampleDesc.class
-rw-r--r--. 1 sqoop hadoop 9568 May 9 13:08 DimSampleDesc.jar
-rw-r--r--. 1 sqoop hadoop 3659 May 9 13:09 DimSampleDesc.avsc

Warehouse-dir的内容:
[sqoop@l1038lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016
Found 1 items
drwxr-xr-x - sqoop hdfs 0 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc
[sqoop@l1038lab root]$
[sqoop@l1038lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc
Found 7 items
-rw-r--r-- 3 sqoop hdfs 0 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/_SUCCESS
-rw-r--r-- 3 sqoop hdfs 2660 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00000.avro
-rw-r--r-- 3 sqoop hdfs 5039870 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00001.avro
-rw-r--r-- 3 sqoop hdfs 1437143 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00002.avro
-rw-r--r-- 3 sqoop hdfs 1486327 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00003.avro
-rw-r--r-- 3 sqoop hdfs 595550 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00004.avro
-rw-r--r-- 3 sqoop hdfs 4792 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00005.avro
[sqoop@l1038lab root]$
[sqoop@l1038lab root]$

然后,我手动复制了avsc和其他文件。
[sqoop@l1038lab root]$ hadoop fs -copyFromLocal /tmp/sqoop-sqoop/compile/d039c1b0b2a2b224d65943df1de34cdd/* /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/

现在将所有文件放在一个位置:
[sqoop@l1038lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/
Found 11 items
-rw-rw-rw- 3 sqoop hdfs 3659 2016-05-09 13:49 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
-rw-rw-rw- 3 sqoop hdfs 28540 2016-05-09 13:49 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.class
-rw-rw-rw- 3 sqoop hdfs 9568 2016-05-09 13:49 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.jar
-rw-rw-rw- 3 sqoop hdfs 61005 2016-05-09 13:49 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.java
-rw-rw-rw- 3 sqoop hdfs 0 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/_SUCCESS
-rw-rw-rw- 3 sqoop hdfs 2660 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00000.avro
-rw-rw-rw- 3 sqoop hdfs 5039870 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00001.avro
-rw-rw-rw- 3 sqoop hdfs 1437143 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00002.avro
-rw-rw-rw- 3 sqoop hdfs 1486327 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00003.avro
-rw-rw-rw- 3 sqoop hdfs 595550 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00004.avro
-rw-rw-rw- 3 sqoop hdfs 4792 2016-05-09 13:09 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00005.avro

现在,我创建了Hive表并对其进行了描述:
CREATE EXTERNAL TABLE DimSampleDesc  ROW FORMAT SERDE  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'  STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  OUTPUTFORMAT  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'  TBLPROPERTIES (    'avro.schema.url'='hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc');
OK
Time taken: 0.166 seconds
hive>
hive>
> describe formatted DimSampleDesc;
OK
# col_name data_type comment


smapiname_ver string
smapicolname string
charttype int
x_indexet int
y_indexet int
x_tick string
y_tick string
x_tickrange string
x_tickrangefrom string
x_tickrangetom string
y_tickrange string
y_tickrangefrom string
y_tickrangetom string
indexcount int
x_indexcount int
y_indexcount int
x_symbol string
x_symbolname string
x_symboldescr string
y_symbol string
y_symbolname string
y_symboldescr string
smapiname string
incorrect_ver_fl boolean


# Detailed Table Information
Database: odp_dw_may2016
Owner: hive
CreateTime: Mon May 09 14:46:40 CEST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://l1031lab.sss.se.com:8020/apps/hive/warehouse/odp_dw_may2016.db/dimsampledesc
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
avro.schema.url hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
numFiles 0
numRows -1
rawDataSize -1
totalSize 0
transient_lastDdlTime 1462798000


# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.avro.AvroSerDe
InputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.416 seconds, Fetched: 56 row(s)
hive>
>

但未找到数据:
hive>
>
> select * from DimSampleDesc;
OK
Time taken: 0.098 seconds
hive>

模式文件:
[sqoop@l1038lab root]$ hadoop fs -cat /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc                                                                                   {
"type" : "record",
"name" : "DimSampleDesc",
"doc" : "Sqoop import of DimSampleDesc",
"fields" : [ {
"name" : "SmapiName_ver",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "SmapiName_ver",
"sqlType" : "12"
}, {
"name" : "SmapiColName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "SmapiColName",
"sqlType" : "12"
}, {
"name" : "ChartType",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "ChartType",
"sqlType" : "4"
}, {
"name" : "X_Indexet",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "X_Indexet",
"sqlType" : "4"
}, {
"name" : "Y_Indexet",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "Y_Indexet",
"sqlType" : "4"
}, {
"name" : "X_Tick",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_Tick",
"sqlType" : "-9"
}, {
"name" : "Y_Tick",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_Tick",
"sqlType" : "-9"
}, {
"name" : "X_TickRange",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_TickRange",
"sqlType" : "-9"
}, {
"name" : "X_TickRangeFrom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_TickRangeFrom",
"sqlType" : "-9"
}, {
"name" : "X_TickRangeTom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_TickRangeTom",
"sqlType" : "-9"
}, {
"name" : "Y_TickRange",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_TickRange",
"sqlType" : "-9"
}, {
"name" : "Y_TickRangeFrom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_TickRangeFrom",
"sqlType" : "-9"
}, {
"name" : "Y_TickRangeTom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_TickRangeTom",
"sqlType" : "-9"
}, {
"name" : "IndexCount",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "IndexCount",
"sqlType" : "4"
}, {
"name" : "X_IndexCount",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "X_IndexCount",
"sqlType" : "4"
}, {
"name" : "Y_IndexCount",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "Y_IndexCount",
"sqlType" : "4"
}, {
"name" : "X_Symbol",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_Symbol",
"sqlType" : "-9"
}, {
"name" : "X_SymbolName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_SymbolName",
"sqlType" : "-9"
}, {
"name" : "X_SymbolDescr",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_SymbolDescr",
"sqlType" : "-9"
}, {
"name" : "Y_Symbol",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_Symbol",
"sqlType" : "-9"
}, {
"name" : "Y_SymbolName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_SymbolName",
"sqlType" : "-9"
}, {
"name" : "Y_SymbolDescr",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_SymbolDescr",
"sqlType" : "-9"
}, {
"name" : "SmapiName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "SmapiName",
"sqlType" : "12"
}, {
"name" : "Incorrect_Ver_FL",
"type" : [ "null", "boolean" ],
"default" : null,
"columnName" : "Incorrect_Ver_FL",
"sqlType" : "-7"
} ],
"tableName" : "DimSampleDesc"
}[sqoop@l1038lab root]$
[sqoop@l1038lab root]$

根本原因是什么,我该如何进行?

最佳答案

使用在进行蜂鸣时获得的相同avroschema文件在 hive 中的顶部创建表。您可以使用avrotools.jar来实现。

检查您在SQL Server中的表是否具有相同的数据。

关于hadoop - Sqoop导入为Avro错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37114765/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com