作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我能够对值进行合并和排序,但无法弄清楚如果值相等则不合并的条件
df = sqlContext.createDataFrame([("foo", "bar","too","aaa"), ("bar", "bar","aaa","foo")], ("k", "K" ,"v" ,"V"))
columns = df.columns
k = 0
for i in range(len(columns)):
for j in range(i + 1, len(columns)):
if columns[i].lower() == columns[j].lower():
k = k+1
df = (df.withColumn(columns[i]+str(k),concat(col(columns[i]),lit(","), col(columns[j]))))
newdf = df.select( col("k"),split(col("c1"), ",\s*").alias("c1"))
sortDf = newdf.select(newdf.k,sort_array(newdf.c1).alias('sorted_c1'))
下表中 k 列和 K 列只合并 [foo,bar] 但不合并 [bar,bar]
输入:
+---+---+---+---+
| k| K| v| V|
+---+---+---+---+
|foo|bar|too|aaa|
|bar|bar|aaa|foo|
+---+---+---+---+
输出:
+---+---+---+---+-----------+
| k| K|Merged K |Merged V |
+---+---+-------------------+
|foo|bar|[foo,bar] |[too,aaa]
|bar|bar|bar |[aaa,foo]
+---+---+---+------+--------+
最佳答案
尝试:
from pyspark.sql.functions import udf
def merge(*c):
merged = sorted(set(c))
if len(merged) == 1:
return merged[0]
else:
return "[{0}]".format(",".join(merged))
merge_udf = udf(merge)
df = sqlContext.createDataFrame([("foo", "bar","too","aaa"), ("bar", "bar","aaa","foo")], ("k1", "k2" ,"v1" ,"v2"))
df.select(merge_udf("k1", "k2"), merge_udf("v1", "v2"))
关于apache-spark - 如何在 pyspark 中合并具有条件的两列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40643550/
我是一名优秀的程序员,十分优秀!