gpt4 book ai didi

python - 在 PySpark 中使用列对象代替字符串有什么优点

转载 作者:行者123 更新时间:2023-12-03 08:32:21 25 4
gpt4 key购买 nike

在 PySpark 中,可以使用列对象和字符串来选择列。两种方式返回相同的结果。有什么区别吗?什么时候应该使用列对象而不是字符串?例如,我可以使用列对象:

import pyspark.sql.functions as F

df.select(F.lower(F.col('col_name')))
# or
df.select(F.lower(df['col_name']))
# or
df.select(F.lower(df.col_name))

或者我可以使用字符串来代替并获得相同的结果:

df.select(F.lower('col_name'))

在 PySpark 中使用列对象代替字符串有哪些优点

最佳答案

阅读 Palantir 的 PySpark 风格指南 here其中解释了何时使用 F.col() 以及最佳实践。Git 链接 here

In many situations the first style can be simpler, shorter and visually less polluted. However, we have found that it faces a number of limitations, that lead us to prefer the second style:

If the dataframe variable name is large, expressions involving it quickly become unwieldy;If the column name has a space or other unsupported character, the bracket operator must be used instead. This generates inconsistency, and df1['colA'] is just as difficult to write as F.col('colA');Column expressions involving the dataframe aren't reusable and can't be used for defining abstract functions;Renaming a dataframe variable can be error-prone, as all column references must be updated in tandem.Additionally, the dot syntax encourages use of short and non-descriptive variable names for the dataframes, which we have found to be harmful for maintainability. Remember that dataframes are containers for data, and descriptive names is a helpful way to quickly set expectations about what's contained within.

By contrast, F.col('colA') will always reference a column designated colA in the dataframe being operated on, named df, in this case. It does not require keeping track of other dataframes' states at all, so the code becomes more local and less susceptible to "spooky interaction at a distance," which is often challenging to debug.

关于python - 在 PySpark 中使用列对象代替字符串有什么优点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64748551/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com