gpt4 book ai didi

scala - 添加包含按 df 分组的列数 og 的列

转载 作者:行者123 更新时间:2023-12-04 20:16:42 34 4
gpt4 key购买 nike

如何使用 group By 子句向 DF 添加包含行数的列?

+------------+-------+
| Category | txn |
+-----===----+-------+
| Cat1 | A |
| Cat1 | A |
| Cat1 | B |
+------------+-------+

期望的输出:

+------------+-------+-----+
| Category | txn | n |
+-----===----+-------+-----+
| Cat1 | A | 2 |
| Cat1 | A | 2 |
| Cat1 | B | 1 |
+------------+-------+-----+

我尝试了以下方法:

 df.withColumn("n", df.groupBy("Category", "txn").count())

它返回了:

 type mismatch;
found : org.apache.spark.sql.DataFrame
(which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
required: org.apache.spark.sql.Column

然后

df.withColumn("n", df.groupBy("Category", "txn").agg(count()))

它返回:

 error: overloaded method value count with alternatives:
(columnName: String)org.apache.spark.sql.TypedColumn[Any,Long] <and>
(e: org.apache.spark.sql.Column)org.apache.spark.sql.Column
cannot be applied to ()

最佳答案

只需进行计数和连接:

val df = Seq(("C1","A"),("C1","A"),("C1","B")).toDF("Category", "Txn")

val countDf = df.groupBy(col("Category"), col("Txn")).count
countDf.show
+--------+---+-----+
|Category|Txn|count|
+--------+---+-----+
| C1| A| 2|
| C1| B| 1|
+--------+---+-----+

df.join(countDf, Seq("Category", "Txn"))
.withColumnRenamed("count", "n")
.show
+--------+---+---+
|Category|Txn| n|
+--------+---+---+
| C1| A| 2|
| C1| A| 2|
| C1| B| 1|
+--------+---+---+

希望对你有帮助

关于scala - 添加包含按 df 分组的列数 og 的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59030738/

34 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com