gpt4 book ai didi

python - 连接后如何在 Pyspark Dataframe 中选择和排序多个列

转载 作者:太空狗 更新时间:2023-10-29 18:22:53 24 4
gpt4 key购买 nike

我想从现有数据框(在连接后创建)中选择多个列,并希望将这些字段排序为我的目标表结构。怎么做到呢 ?我使用的方法如下。在这里我可以选择需要但不能按顺序制作的必要列。

Required (Target Table structure) :
hist_columns = ("acct_nbr","account_sk_id", "zip_code","primary_state", "eff_start_date" ,"eff_end_date","eff_flag")

account_sk_df = hist_process_df.join(broadcast(df_sk_lkp) ,'acct_nbr','inner' )
account_sk_df_ld = account_sk_df.select([c for c in account_sk_df.columns if c in hist_columns])

>>> account_sk_df
DataFrame[acct_nbr: string, primary_state: string, zip_code: string, eff_start_date: string, eff_end_date: string, eff_flag: string, hash_sk_id: string, account_sk_id: int]


>>> account_sk_df_ld
DataFrame[acct_nbr: string, primary_state: string, zip_code: string, eff_start_date: string, eff_end_date: string, eff_flag: string, account_sk_id: int]

account_sk_id 需要排在第二位。执行此操作的最佳方法是什么?

最佳答案

尝试通过只给出一个列表来选择列,而不是通过迭代现有的列并且排序应该没问题:

account_sk_df_ld = account_sk_df.select(*hist_columns)

关于python - 连接后如何在 Pyspark Dataframe 中选择和排序多个列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40467449/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com