gpt4 book ai didi

scala - 将列值映射到 spark 中的数字类型

转载 作者:行者123 更新时间:2023-12-04 22:28:41 24 4
gpt4 key购买 nike

我在 spark 中有一个 df,其结构如下:

amount gender status
1000 male married
1313 female single
1000 male married

基本上我想创建一个新列,其中性别是一个数字
amount gender status  gender_num
1000 male married 1
1313 female single 2
1000 male married 1

我累了以下几点:
  val gender = df.gender

val gender_num = gender match {
case male => 1
case female => 2
}

我收到以下错误:
<console>:125: error: value pa_gender_category is not a member of org.apache.spark.sql.DataFrame
val gender = data.pa_gender_category

我知道有一个 stringtoindex 函数,但我想手动执行此操作

最佳答案

使用 withColumn

val input = // load input DataFrame
val withGender = input.withColumn("gender_num", when($"gender" === "female", 2).otherwise(1))

您可以链接更多选项:
val withGender = input.withColumn("gender_num", when($"gender" === "female", 2).when($"gender" == "other", 3).otherwise(1))

您也可以像 Akash 的回答一样使用 UDF。请注意,有时 UDF 无法像内置函数那样优化,但它们可以更具可读性

关于scala - 将列值映射到 spark 中的数字类型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43587835/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com