gpt4 book ai didi

json - 自定义文件名以在PySpark中写入数据框

转载 作者:行者123 更新时间:2023-12-02 18:49:10 24 4
gpt4 key购买 nike

我想写数据框的记录。记录为json格式。因此,我需要使用自定义文件名而不是part-0000-cfhbhgh.json将内容写入文件。

最佳答案

我在scala中给出了答案,但在python中,这些也是必不可少的步骤。

 import org.apache.hadoop.fs.{FileSystem, Path}

val fs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration);
val file = fs.globStatus(new Path("data/jsonexample/part*"))(0).getPath().getName()
println("file name " + file)
fs.rename(
new Path("data/jsonexample/" + file)
, new Path("data/jsonexample/tsuresh97_json_toberenamed.json"))

完整示例:
 import spark.implicits._

val df = Seq(
(123, "ITA", 1475600500, 18.0),
(123, "ITA", 1475600500, 18.0),
(123, "ITA", 1475600516, 19.0)
).toDF("Value", "Country", "Timestamp", "Sum")
df.coalesce(1)
.write
.mode(SaveMode.Overwrite)
.json("data/jsonexample/")

import org.apache.hadoop.fs.{FileSystem, Path}

val fs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration);
val file = fs.globStatus(new Path("data/jsonexample/part*"))(0).getPath().getName()
println("file name " + file)
fs.rename(
new Path("data/jsonexample/" + file)
, new Path("data/jsonexample/tsuresh97_json_toberenamed.json"))



结果:
enter image description here

json内容:
{"Value":123,"Country":"ITA","Timestamp":1475600500,"Sum":18.0}
{"Value":123,"Country":"ITA","Timestamp":1475600500,"Sum":18.0}
{"Value":123,"Country":"ITA","Timestamp":1475600516,"Sum":19.0}

关于json - 自定义文件名以在PySpark中写入数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61413911/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com