gpt4 book ai didi

scala - 如何在 aws 胶中读取非 utf-8 编码的表?

转载 作者:行者123 更新时间:2023-12-01 04:44:12 25 4
gpt4 key购买 nike

这是一个用于读取 csv 文件(scala)的代码:

val input = glueContext
.getCatalogSource(database = "my_database", tableName = "my_table")
.getDynamicFrame()

哪个失败了,错误不明确:
com.amazonaws.services.glue.util.FatalException: Unable to parse file: my_file_20170101.csv.gz
at com.amazonaws.services.glue.readers.JacksonReader.hasNextFailSafe(JacksonReader.scala:91)
at com.amazonaws.services.glue.readers.JacksonReader.hasNext(JacksonReader.scala:36)
at com.amazonaws.services.glue.hadoop.TapeHadoopRecordReader.nextKeyValue(TapeHadoopRecordReader.scala:63)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:199)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

该代码适用于其他 scv 文件,但此文件具有 ANSI编码。有没有办法告诉胶水(或可能激发内部结构)以不同的编码读取文件?

最佳答案

可以使用底层的 spark 功能从非 UTF-8 文件中导入 spark df(我使用 python 如下):

# imports
from pyspark.context import SparkContext
from awsglue.context import GlueContext

...

# set contexts
glueContext = GlueContext(SparkContext.getOrCreate())

....

# import file
df = glueContext.read.load(my_file,
format="csv",
sep="|",
header="true",
encoding='my_encoding')

关于scala - 如何在 aws 胶中读取非 utf-8 编码的表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48276878/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com