gpt4 book ai didi

r - sparklyr 特征转换函数导致错误

转载 作者:行者123 更新时间:2023-12-01 07:36:12 24 4
gpt4 key购买 nike

我在使用 ft_.. sparklyr R 包中的函数时遇到了一些问题。 ft_bucketizer 有效,但 ft_normalizer 或 ft_min_max_scaler 无效。这是一个例子:

library(sparklyr)
library(dplyr)
library(nycflights13)

sc <- spark_connect(master = "local", version = "2.1.0")
x = flights %>% select(dep_delay)
x_tbl <- sdf_copy_to(sc, x)

# works fine
ft_binarizer(x=x_tbl, input.col = "dep_delay", output.col = "delayed", threshold = 0)

# error
ft_normalizer(x= x_tbl, input.col = "dep_delay", output.col = "delayed_norm")

# error
ft_min_max_scaler(x= x_tbl,input.col = "dep_delay",output.col = "delayed_min_max")

标准化器返回:

Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 9.0 failed 1 times, most recent failure: Lost task 0.0 in stage 9.0 (TID 9, localhost, executor driver): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$createTransformFunc$1: (double) => vector)"

min_max_scaler 返回:

"Error: java.lang.IllegalArgumentException: requirement failed: Column dep_delay must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually DoubleType."

我认为是数据类型的问题,但不知道如何解决。有人知道该怎么做吗?

非常感谢!

最佳答案

ft_normalizerVector 列进行操作,因此您必须首先使用 ft_vector_assembler:

ft_vector_assembler(x_tbl, input_cols="dep_delay", output_col="dep_delay_v") %>% 
ft_normalizer(input.col = "dep_delay_v", output.col = "delayed_v_norm")

关于r - sparklyr 特征转换函数导致错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50262557/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com