gpt4 book ai didi

python - Pyspark 在另一列中的一列中查找模式

转载 作者:行者123 更新时间:2023-11-28 22:11:45 25 4
gpt4 key购买 nike

我有一个包含两列的数据框,地址和街道名称。

from pyspark.sql.functions import *
import pyspark.sql

df = spark.createDataFrame([\
['108 badajoz road north ryde 2113, nsw, australia', 'north ryde'],\
['25 smart street fairfield 2165, nsw, australia', 'smart street']
],\
['address', 'street_name'])

df.show(2, False)

+------------------------------------------------+---------------+
|address |street_name |
+------------------------------------------------+---------------+
|108 badajoz road north ryde 2113, nsw, australia|north ryde |
|25 smart street fairfield 2165, nsw, australia |smart street |
+------------------------------------------------+---------------+

我想查找 street_name 是否存在于 address 中并在新列中返回一个 bool 值。我可以像下面这样手动搜索模式。

df.withColumn("new col", col("street").rlike('.*north ryde.*')).show(20,False)
----------------------------------------------+---------------+-------+
|address |street_name |new col|
+------------------------------------------------+------------+-------+
|108 badajoz road north ryde 2113, nsw, australia|north ryde |true |
|25 smart street fairfield 2165, nsw, australia |smart street|false |
+------------------------------------------------+------------+-------+

但我想用 street_name 列替换手动值,如下所示

 df.withColumn("new col", col("street")\
.rlike(concat(lit('.*'),col('street_name'),col('.*))))\
.show(20,False)

最佳答案

您可以通过简单地使用 contains 函数来做到这一点。有关详细信息,请参阅 this :

from pyspark.sql.functions import col, when

df = df.withColumn('new_Col',when(col('address').contains(col('street_name')),True).otherwise(False))
df.show(truncate=False)

+------------------------------------------------+------------+-------+
|address |street_name |new_Col|
+------------------------------------------------+------------+-------+
|108 badajoz road north ryde 2113, nsw, australia|north ryde |true |
|25 smart street fairfield 2165, nsw, australia |smart street|true |
+------------------------------------------------+------------+-------+

关于python - Pyspark 在另一列中的一列中查找模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55449545/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com