gpt4 book ai didi

python - 带有 case 语句的 for 循环

转载 作者:行者123 更新时间:2023-12-05 09:35:33 25 4
gpt4 key购买 nike

我正在尝试寻找一种方法来运行 for 循环以更好地针对我的案例语句优化我的脚本。

下面显示的脚本没有错误,但我觉得这太冗长了,可能会在下次维护时造成困惑。

df = df.withColumn('Product', when(df.where('input_file_name LIKE "%CAD%"'), 'Cash and DUE').
when(df.where('input_file_name LIKE "%TP%"'), 'Trade Product').
when(df.where('input_file_name LIKE "%LNS%"'), 'Corp Loans').
when(df.where('input_file_name LIKE "%DBT%"'), 'Debt').
when(df.where('input_file_name LIKE "%CRD%"'), 'Retail Cards').
when(df.where('input_file_name LIKE "%MTG%"'), 'Mortage').
when(df.where('input_file_name LIKE "%OD%"'), 'Overdraft').
when(df.where('input_file_name LIKE "%PLN%"'), 'Retail Personal Loan').
when(df.where('input_file_name LIKE "%CLN%"'), 'CLN').
when(df.where('input_file_name LIKE "%CAT%"'), 'Custody and Trust').
when(df.where('input_file_name LIKE "%DEP%"'), 'Deposits').
when(df.where('input_file_name LIKE "%STZ%"'), 'Securitization').
when(df.where('input_file_name LIKE "%SECZ%"'), 'Security Securitization').
when(df.where('input_file_name LIKE "%SEC%"'), 'Securities').
when(df.where('input_file_name LIKE "%MTSZ%"'), 'Retail Mortage Securitization').
when(df.where('input_file_name LIKE "%PLSZ%"'), 'Retail Personal Loan Securitization').
when(df.where('input_file_name LIKE "%CCSZ%"'), 'Retail Cards Securitization').
when(df.where('input_file_name LIKE "%CMN%"'), 'Cash Management').
when(df.where('input_file_name LIKE "%OTC%"'), 'Over-the-counter').
when(df.where('input_file_name LIKE "%SFT%"'), 'Securities Financing Transactions').
when(df.where('input_file_name LIKE "%ETD%"'), 'Excahnge Traded Deriative').
when(df.where('input_file_name LIKE "%DEF%"'), 'Default Products').
when(df.where('input_file_name LIKE "%FFS%"'), 'Not Required').
when(df.where('input_file_name LIKE "%hdfs%"'), 'Not Required').
otherwise('feed_name'));

我想到了运行一个循环,一个例子如下所示(脚本不正确,它是为了演示目的)

product_code = ['%CAD%','%TP%','%LNS%','%DBT%','%CRD%','%MTG%','%OD%','%PLN%','%CLN%','%CAT%','%DEP%','%STZ%','%SECZ%','%SEC%','%MTSZ%','%PLSZ%','%CCSZ%','%CMN%','%OTC%','%SFT%','%ETD%','%DEF%','%FFS%','%hdfs%']
product_name = ['Cash and Due','Trade Product','Corp Loans','Debt','Retail Cards','Mortage','Overdraft','Retail Personal Loan','CLN','Custody and Trust','Deposits','Securitization','Securities Securitization','Securities','Retail Mortage Securitization','Retail Personal Loan Securitization','Retail Cards Securitization','Cash Management','Over-the-counter','Securities Finanacing Transactions','Exchange Traded Derivative','Default Products','Not Required','Not Required']

##Both product_code & product name have the same number of index

lastIndex = len(product_code)
for x in product_code:
# Logic i thought df.withColumn('Product', when(df.where('input_file_name LIKE "%'product_code[x]'%"'), product_name[x])
if(product_code[lastIndex]):
#otherwise('feed_name')

如果在 spark 中可以为 when(df.where()).otherwise 的 case 语句运行循环,或者有另一种方法或用例,我需要一些建议

已更新

我已经按照建议的方法实现了,查询在条件集上返回正确,但我想知道为什么它没有返回正确的值,也就是下面脚本中的 lit(),而不是删除不正确的行满足条件

Sample DF:
product_code = ['%CMN%','%TP%','%LNS%']
product_name = ['Cash and Due','Trade Product']
feed_name = ['farid','arshad','jimmy']

df = spark.createDataFrame(
list(zip(inp_file,feed_name)),
['input_file_name','feed_name']
)

+---------------+---------+
|input_file_name|feed_name|
+---------------+---------+
|sdasdasdasd |bob |
|_CMN_BD |arshad |
|_CMN_BD_WS |jimmy |
+---------------+---------+

product_code = ['%CAD_%','%TP%','%LNS%','%DBT%','%CRD%','%MTG%','%_OD_%','%PLN%','%CLN%','%CAT%','%DEP%','%STZ%','%SECZ%','%SEC%','%MTSZ%','%PLSZ%','%CCSZ%','%CMN%','%OTC%','%SFT%','%ETD%','%DEF%','%FFS%','%hdfs%']
product_name = ['Cash and Due','Trade Product','Corp Loans','Debt','Retail Cards','Mortage','Overdraft','Retail Personal Loan','CLN','Custody and Trust','Deposits','Securitization','Securities Securitization','Securities','Retail Mortage Securitization','Retail Personal Loan Securitization','Retail Cards Securitization','Cash Management','Over-the-counter','Securities Finanacing Transactions','Exchange Traded Derivative','Default Products','Not Required','Not Required']

## -- Create spark dataframe and with list tuple
## -- Lit is used to add new column

product_ref_df = spark.createDataFrame(
list(zip(product_code, product_name)),
["product_code", "product_name"]
)


def tempDF(df,targetField,columnTitle,condition,targetResult,show=False):
product_ref_df = spark.createDataFrame(
list(zip(condition,targetResult)),
["condition", "target_result"]
)

df.join(broadcast(product_ref_df), expr(""+targetField+" like condition")) \
.withColumn(columnTitle, coalesce(col("target_result"), lit("feed_name"))) \
.drop('condition','target_result') \
.show()

return df

product_ref_df = tempDF(df,'input_file_name','Product',product_code,product_name)

脚本触发时,没有报错,返回结果如图,

+---------------+---------+------------+
|input_file_name|feed_name| Product|
+---------------+---------+------------+
| _CMN_BD| arshad|Cash and Due|
| _CMN_BD_WS| jimmy|Cash and Due|
+---------------+---------+------------+

结果不应该返回第一行,因为我们没有删除任何行,

+---------------+---------+------------+
|input_file_name|feed_name| Product|
+---------------+---------+------------+
| sdasdasdasd| bob|bob |
| _CMN_BD| jimmy|Cash and Due|
| _CMN_BD_WS| jimmy|Cash and Due|
+---------------+---------+------------+
+---------------+---------+------------+

最佳答案

您可以从这些产品名称引用创建一个新的 DataFrame 并与原始 df 连接以获取产品名称:

from pyspark.sql.functions import expr, col, broadcast, coalesce

product_ref_df = spark.createDataFrame(
list(zip(product_code, product_name)),
["product_code", "product_name"]
)

df.join(broadcast(product_ref_df), expr("input_file_name like product_code"), "left") \
.withColumn("Product", coalesce(col("product_name"), col("Feed_name"))) \
.drop("product_code", "product_name") \
.show()

或者使用 functools.reduce 来链接 case/when 条件,如下所示:

import functools

from pyspark.sql.functions import lit, col, when

case_conditions = list(zip(product_code, product_name))

product_col = functools.reduce(
lambda acc, x: acc.when(col(f"input_file_name").like(x[1]), lit(x[1])),
case_conditions[1:],
when(col("input_file_name").like(case_conditions[0][0]), lit(case_conditions[0][1]))
).otherwise(col("Feed_name"))

df.withColumn("Product", product_col).show()

关于python - 带有 case 语句的 for 循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65786478/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com