gpt4 book ai didi

hadoop - Sqoop 导出到 RDBMS .lzo .gz 文件超过 64 MB 加载重复

转载 作者:可可西里 更新时间:2023-11-01 15:41:07 46 4
gpt4 key购买 nike

使用 sqoop 1.3

尝试将 hdfs 输出导出到 mysql 表

加载大小超过 300MB 的未压缩文件时一切正常

但是在加载大小为 75 MB 或 79 MB 的压缩文件(.gz 和 .lzo)时,我看到加载到表中的行数翻了一番。当压缩文件的大小为 60MB 或更小时(猜测与 64 MB, block 大小相关的东西),这不会发生。我在上述上下文中所做的一些操作:

bash-3.2$ ls -ltr
-rw-r--r-- 1 bhargavn bhargavn 354844413 Nov 16 02:27 large_file
-rw-rw-r-- 1 bhargavn bhargavn 15669507 Nov 21 03:41 small_file.lzo
-rw-rw-r-- 1 bhargavn bhargavn 75173037 Nov 21 03:46 large_file.lzo

bash-3.2$ wc -l large_file
247060 large_file

bash-3.2$ sqoop export --connect 'jdbc:mysql://db.com/test?zeroDateTimeBehavior=round& rewriteBatchedStatements=true'
--table table_with_large_data
--username sqoopuser
--password sqoop
--export-dir /user/bhargavn/workspace/data/sqoop-test/large_file.lzo
--fields-terminated-by '\001' -m 1
[21/11/2012:05:52:28 PST] main INFO org.apache.hadoop.mapred.JobClient: map 0% reduce 0%
[21/11/2012:05:57:03 PST] main INFO com.cloudera.sqoop.mapreduce.ExportJobBase: Transferred 143.3814 MB in 312.2832 seconds (470.1584 KB/sec)
[21/11/2012:05:57:03 PST] main INFO com.cloudera.sqoop.mapreduce.ExportJobBase: Exported 494120 records.

mysql> select count(1) from table_with_large_data;
+----------+
| count(1) |
+----------+
| 494120 |
+----------+

mysql> truncate table_with_large_data;
bash-3.2$ sqoop export --connect 'jdbc:mysql://db.com/test?zeroDateTimeBehavior=round& rewriteBatchedStatements=true'
--table table_with_large_data
--uername sqoopuser
--password sqoop
--export-dir /user/bhargavn/workspace/data/sqoop-test/large_file
--fields-terminated-by '\001'
-m 1
[21/11/2012:06:05:35 PST] main INFO org.apache.hadoop.mapred.JobClient: map 0% reduce 0%
[21/11/2012:06:08:06 PST] main INFO org.apache.hadoop.mapred.JobClient: map 100% reduce 0%
[21/11/2012:06:08:06 PST] main INFO com.cloudera.sqoop.mapreduce.ExportJobBase: Transferred 338.4573 MB in 162.5891 seconds (2.0817 MB/sec)
[21/11/2012:06:08:06 PST] main INFO com.cloudera.sqoop.mapreduce.ExportJobBase: Exported 247060 records.
mysql> select count(1) from table_with_large_data;
+----------+
| count(1) |
+----------+
| 247060 |
+----------+

最佳答案

您可能会遇到 Sqoop 需要修复的已知错误[1]。

您介意加入 Sqoop 用户邮件列表 [2] 并在那里描述您的问题吗?我非常有信心 Sqoop 开发人员会介入解决这个特定问题。

亚尔塞克

链接:

1:https://issues.apache.org/jira/browse/SQOOP-721

2:http://sqoop.apache.org/mail-lists.html

关于hadoop - Sqoop 导出到 RDBMS .lzo .gz 文件超过 64 MB 加载重复,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13511818/

46 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com