gpt4 book ai didi

apache-spark - Pandas UDF 不比 Spark UDF 快吗?

转载 作者:行者123 更新时间:2023-12-05 02:54:31 26 4
gpt4 key购买 nike

<分区>

我从 Pyspark 网站获取了以下 UDF,因为我试图了解是否存在性能改进。我做了很大范围的数字,但两者花费的时间几乎相同,我做错了什么?

谢谢!

import pandas as pd
from pyspark.sql.functions import col, udf
from pyspark.sql.types import LongType
import time

start = time.time()
# Declare the function and create the UDF
def multiply_func(a, b):
return a * b

multiply = udf(multiply_func, returnType=LongType())

# The function for a pandas_udf should be able to execute with local Pandas data
x = pd.Series(list(range(1, 1000000)))
print(multiply_func(x, x))
# 0 1
# 1 4
# 2 9
# dtype: int64
end = time.time()
print(end-start)

这是 Pandas UDF

import pandas as pd
from pyspark.sql.functions import col, pandas_udf
from pyspark.sql.types import LongType
import time

start = time.time()
# Declare the function and create the UDF
def multiply_func(a, b):
return a * b

multiply = pandas_udf(multiply_func, returnType=LongType())

# The function for a pandas_udf should be able to execute with local Pandas data
x = pd.Series(list(range(1, 1000000)))
print(multiply_func(x, x))
# 0 1
# 1 4
# 2 9
# dtype: int64

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com