gpt4 book ai didi

scala - 如何使用 Spark/scala 解析 YAML

转载 作者:行者123 更新时间:2023-12-03 08:51:13 33 4
gpt4 key购买 nike

我有包含以下详细信息的 yaml 文件。文件名:config.yml

- firstName: "James"
lastName: "Bond"
age: 30

- firstName: "Super"
lastName: "Man"
age: 25

从中我需要使用 Spark 和 scala 来获取 Spark 数据框

+---+---------+--------+
|age|firstName|lastName|
+---+---------+--------+
|30 |James |Bond |
|25 |Super |Man |
+---+---------+--------+

我尝试转换为 json,然后转换为 dataframe,但我无法在数据集序列中指定它。

最佳答案

有一个解决方案,可以帮助您将 yaml 转换为 json,然后将其读取为 DataFrame

您需要添加这 2 个依赖项:

import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.dataformat.yaml.YAMLFactory
class ScalaYamltoDataFrame {

val yamlExample = "- firstName: \"James\"\n lastName: \"Bond\"\n age: 30\n\n- firstName: \"Super\"\n lastName: \"Man\"\n age: 25"

def convertYamlToJson(yaml: String): String = {
val yamlReader = new ObjectMapper(new YAMLFactory)
val obj = yamlReader.readValue(yaml, classOf[Any])
val jsonWriter = new ObjectMapper
jsonWriter.writeValueAsString(obj)
}

println(convertYamlToJson(yamlExample))

def yamlToDF(): Unit = {

@transient
lazy val sparkSession = SparkSession.builder
.master("local")
.appName("Convert Yaml to Dataframe")
.getOrCreate()

import sparkSession.implicits._

val ds = sparkSession.read
.option("multiline", true)
.json(Seq(convertYamlToJson(yamlExample)).toDS)


ds.show(false)

ds.printSchema()
}

//println(convertYamlToJson(yamlExample))
[{"firstName":"James","lastName":"Bond","age":30},{"firstName":"Super","lastName":"Man","age":25}]

//ds.show(false)
+---+---------+--------+
|age|firstName|lastName|
+---+---------+--------+
|30 |James |Bond |
|25 |Super |Man |
+---+---------+--------+


//ds.printSchma()
root
|-- age: long (nullable = true)
|-- firstName: string (nullable = true)
|-- lastName: string (nullable = true)

希望这有帮助!

关于scala - 如何使用 Spark/scala 解析 YAML,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58806113/

33 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com