gpt4 book ai didi

scala - 在 Spark 中连接稀疏向量?

转载 作者:行者123 更新时间:2023-12-04 17:55:22 27 4
gpt4 key购买 nike

假设您有两个稀疏向量。举个例子:

val vec1 = Vectors.sparse(2, List(0), List(1)) // [1, 0]
val vec2 = Vectors.sparse(2, List(1), List(1)) // [0, 1]

我想连接这两个向量,以便结果等效于:
val vec3 = Vectors.sparse(4, List(0, 2), List(1, 1)) // [1, 0, 0, 1]

Spark 有没有这样方便的方法来做到这一点?

最佳答案

如果您有 DataFrame 中的数据,然后 VectorAssembler将是正确的使用方法。例如:

from pyspark.ml.feature import VectorAssembler

dataset = spark.createDataFrame(
[(0, Vectors.sparse(10, {0: 0.6931, 5: 0.0, 7: 0.5754, 9: 0.2877}), Vectors.sparse(10, {3: 0.2877, 4: 0.6931, 5: 0.0, 6: 0.6931, 8: 0.6931}))],
["label", "userFeatures1", "userFeatures2"])

assembler = VectorAssembler(
inputCols=["userFeatures1", "userFeatures2"],
outputCol="features")

output = assembler.transform(dataset)
output.select("features", "label").show(truncate=False)

为此,您将获得以下输出:
+---------------------------------------------------------------------------+-----+
|features |label|
+---------------------------------------------------------------------------+-----+
|(20,[0,7,9,13,14,16,18], [0.6931,0.5754,0.2877,0.2877,0.6931,0.6931,0.6931])|0|
+---------------------------------------------------------------------------+-----+

关于scala - 在 Spark 中连接稀疏向量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34097926/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com