gpt4 book ai didi

scala - 读取 s3 存储桶时出错

转载 作者:可可西里 更新时间:2023-11-01 14:40:35 26 4
gpt4 key购买 nike

我在尝试使用 spark 从 s3 读取文件时遇到异常。错误和代码如下。该文件夹由许多名为 part-00000 part-00001 等的文件组成,这些文件来自 hadoop。它们的文件大小范围从 0kb 到几 gb

16/04/07 15:38:58 INFO NativeS3FileSystem: Opening key 'titlematching214/1.0/bypublicdemand/part-00000' for reading at position '0' 16/04/07 15:38:58 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 GET failed for '/titlematching214%2F1.0%2Fbypublicdemand%2Fpart-00000' XML Error Message: InvalidRangeThe requested range is not satisfiablebytes=0-01AED523DF401F17ECBYUH1h3WkC7/g8/EFE/YyHbzxoNTpRBiX6QMy2RXHur17lYTZXd7XxOWivmqIpu0F7Xx5zdWns=

object ReadMatches extends App{
override def main(args: Array[String]): Unit = {
val config = new SparkConf().setAppName("RunAll").setMaster("local")
val sc = new SparkContext(config)
val hadoopConf = sc.hadoopConfiguration
hadoopConf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem")
hadoopConf.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem")
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "myRealKeyId")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "realKey")
val sqlConext = new SQLContext(sc)

val datset = sc.textFile("s3n://altvissparkoutput/titlematching214/1.0/*/*")
val ebayRaw = sqlConext.read.json(datset)
val data = ebayRaw.first();
}
}

最佳答案

也许您可以直接从 s3 读取您的数据集。

    val datset = "s3n://altvissparkoutput/titlematching214/1.0/*/*"
val ebayRaw = sqlConext.read.json(datset)

关于scala - 读取 s3 存储桶时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36479624/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com