gpt4 book ai didi

r - sparklyr:创建具有 mutate 功能的新列

转载 作者:行者123 更新时间:2023-12-04 10:33:22 25 4
gpt4 key购买 nike

如果使用 sparklyr 无法解决此类问题,我感到非常惊讶:

iris_tbl <- copy_to(sc, aDataFrame)

# date_vector is a character vector of element
# in this format: YYYY-MM-DD (year, month, day)
for (d in date_vector) {
...
aDataFrame %>% mutate(newValue=gsub("-","",d)))
...
}

我收到此错误:
Error: org.apache.spark.sql.AnalysisException: Undefined function: 'GSUB'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 2 pos 86
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.failFunctionLookup(SessionCatalog.scala:787)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction0(HiveSessionCatalog.scala:200)
at org.apache.spark.sql.hive.HiveSessionCatalog.lookupFunction(HiveSessionCatalog.scala:172)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$6$$anonfun$applyOrElse$39.apply(Analyzer.scala:884)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$6$$anonfun$applyOrElse$39.apply(Analyzer.scala:884)
at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun

但是有了这条线:
aDataFrame %>% mutate(newValue=toupper("hello"))

事情工作。一些帮助?

最佳答案

值得补充的是,可用文档指出:

Hive Functions

Many of Hive’s built-in functions (UDF) and built-in aggregate functions (UDAF) can be called inside dplyr’s mutate and summarize. The Languange Reference UDF page provides the list of available functions.



hive

如文档中所述,使用 regexp_replace 应该可以实现可行的解决方案。 :

Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc.


sparklyr方法

考虑到上述情况,应该可以结合 sparklyr管道与 regexp_replace达到与应用相同的效果 gsub在所需的列上。已测试代码删除 - 内字符 sparklyr在变量 d 可以构建如下:
aDataFrame %>% 
mutate(clnD = regexp_replace(d, "-", "")) %>%
# ...

哪里 class(aDataFrame )返回: "tbl_spark" ... .

关于r - sparklyr:创建具有 mutate 功能的新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40285594/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com