gpt4 book ai didi

apache-spark - TypeError : 'Column' object is not callable using WithColumn

转载 作者:行者123 更新时间:2023-12-03 09:24:36 25 4
gpt4 key购买 nike

我想在get_distance函数的数据框“df”上添加新列:

def get_distance(x, y):
dfDistPerc = hiveContext.sql("select column3 as column3, \
from tab \
where column1 = '" + x + "' \
and column2 = " + y + " \
limit 1")

result = dfDistPerc.select("column3").take(1)
return result

df = df.withColumn(
"distance",
lit(get_distance(df["column1"], df["column2"]))
)

但是,我得到了:
TypeError: 'Column' object is not callable

我认为这是因为x和y是 Column对象,我需要转换为 String才能在查询中使用。我对吗?如果是这样,我该怎么做?

最佳答案

Spark应该知道您使用的功能不是普通功能,而是UDF。

因此,有两种方法可以在数据帧上使用UDF。

方法-1:使用@udf批注

@udf
def get_distance(x, y):
dfDistPerc = hiveContext.sql("select column3 as column3, \
from tab \
where column1 = '" + x + "' \
and column2 = " + y + " \
limit 1")

result = dfDistPerc.select("column3").take(1)
return result

df = df.withColumn(
"distance",
lit(get_distance(df["column1"], df["column2"]))
)

方法2:使用pyspark.sql.functions.udf替代udf
def get_distance(x, y):
dfDistPerc = hiveContext.sql("select column3 as column3, \
from tab \
where column1 = '" + x + "' \
and column2 = " + y + " \
limit 1")

result = dfDistPerc.select("column3").take(1)
return result

calculate_distance_udf = udf(get_distance, IntegerType())

df = df.withColumn(
"distance",
lit(calculate_distance_udf(df["column1"], df["column2"]))
)

关于apache-spark - TypeError : 'Column' object is not callable using WithColumn,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48305443/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com