gpt4 book ai didi

r - 在 Sparklyr (spark_read_csv) 中指定 col 类型

转载 作者:行者123 更新时间:2023-12-03 21:52:45 25 4
gpt4 key购买 nike

我正在使用 SpraklyR 将 csv 读入 spark

schema <- structType(structField("TransTime", "array<timestamp>", TRUE),
structField("TransDay", "Date", TRUE))

spark_read_csv(sc, filename, "path", infer_schema = FALSE, schema = schema)

但是得到:
Error: could not find function "structType"

如何使用 spark_read_csv 指定列类型?

提前致谢。

最佳答案

structType 函数来自 Scala 的 SparkAPI,在 Sparklyr 中指定数据类型必须在“column”参数中作为列表传递,假设我们有以下 CSV(data.csv):

name,birthdate,age,height
jader,1994-10-31,22,1.79
maria,1900-03-12,117,1.32

读取相应数据的函数为:
mycsv <- spark_read_csv(sc, "mydate", 
path = "data.csv",
memory = TRUE,
infer_schema = FALSE, #attention to this
columns = list(
name = "character",
birthdate = "date", #or character because needs date functions
age = "integer",
height = "double"))
# integer = "INTEGER"
# double = "REAL"
# character = "STRING"
# logical = "INTEGER"
# list = "BLOB"
# date = character = "STRING" # not sure

要操作日期类型,您必须使用 hive date functions , 不是 R 函数。
mycsv %>% mutate(birthyear = year(birthdate))

引用: https://spark.rstudio.com/articles/guides-dplyr.html#hive-functions

关于r - 在 Sparklyr (spark_read_csv) 中指定 col 类型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43003185/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com