ai didi

Pyspark 拆分列

转载 作者:行者123 更新时间:2023-12-03 16:14:56 24 4
gpt4 key购买 nike

from pyspark.sql import Row, functions as F
row = Row("UK_1","UK_2","Date","Cat",'Combined')
agg = ''
agg = 'Cat'
tdf = (sc.parallelize
([
row(1,1,'12/10/2016',"A",'Water^World'),
row(1,2,None,'A','Sea^Born'),
row(2,1,'14/10/2016','B','Germ^Any'),
row(3,3,'!~2016/2/276','B','Fin^Land'),
row(None,1,'26/09/2016','A','South^Korea'),
row(1,1,'12/10/2016',"A",'North^America'),
row(1,2,None,'A','South^America'),
row(2,1,'14/10/2016','B','New^Zealand'),
row(None,None,'!~2016/2/276','B','South^Africa'),
row(None,1,'26/09/2016','A','Saudi^Arabia')
]).toDF())
cols = F.split(tdf['Combined'], '^')
tdf = tdf.withColumn('column1', cols.getItem(0))
tdf = tdf.withColumn('column2', cols.getItem(1))
tdf.show(truncate = False )

以上是我的示例代码。

出于某种原因,它没有按 ^ 字符拆分列。

有什么建议吗?

最佳答案

模式是一个正则表达式,见split ;和 ^是匹配正则表达式中字符串开头的 anchor ,要逐字匹配,您需要对其进行转义:

cols = F.split(tdf['Combined'], r'\^')
tdf = tdf.withColumn('column1', cols.getItem(0))
tdf = tdf.withColumn('column2', cols.getItem(1))
tdf.show(truncate = False)

+----+----+------------+---+-------------+-------+-------+
|UK_1|UK_2|Date |Cat|Combined |column1|column2|
+----+----+------------+---+-------------+-------+-------+
|1 |1 |12/10/2016 |A |Water^World |Water |World |
|1 |2 |null |A |Sea^Born |Sea |Born |
|2 |1 |14/10/2016 |B |Germ^Any |Germ |Any |
|3 |3 |!~2016/2/276|B |Fin^Land |Fin |Land |
|null|1 |26/09/2016 |A |South^Korea |South |Korea |
|1 |1 |12/10/2016 |A |North^America|North |America|
|1 |2 |null |A |South^America|South |America|
|2 |1 |14/10/2016 |B |New^Zealand |New |Zealand|
|null|null|!~2016/2/276|B |South^Africa |South |Africa |
|null|1 |26/09/2016 |A |Saudi^Arabia |Saudi |Arabia |
+----+----+------------+---+-------------+-------+-------+

关于Pyspark 拆分列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46835882/

24 4 0
文章推荐: VHDL 算术 shift_left
文章推荐: c - 将数组初始化为 0 需要多长时间?
文章推荐: elixir - 在 config.exs 中使用 Application.app_dir( :my_app, "priv")
文章推荐: spring-boot - Spring websocket EOFException
行者123
个人简介

我是一名优秀的程序员,十分优秀!

滴滴打车优惠券免费领取
滴滴打车优惠券
全站热门文章
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com