gpt4 book ai didi

java - 如何在Java中分解特征向量?

转载 作者:行者123 更新时间:2023-12-01 18:50:04 24 4
gpt4 key购买 nike

我有一个数据框如下:

+---------------+--------------------+
|IndexedArtistID| recommendations|
+---------------+--------------------+
| 1580|[[919, 0.00249262...|
| 4900|[[41749, 7.143963...|
| 5300|[[0, 2.0147272E-4...|
| 6620|[[208780, 9.81092...|
+---------------+--------------------+

我想拆分推荐列,以便获得如下数据框:

+---------------+--------------------+
|IndexedArtistID| recommendations|
+---------------+--------------------+
| 1580|919 |
| 1580|0.00249262 |
| 4900|41749 |
| 4900|7.143963 |
| 5300|0 |
| 5300|2.0147272E-4 |
| 6620|208780 |
| 6620|9.81092 |
+---------------+--------------------+

基本上,我想将特征向量拆分为列,然后将这些列合并为单个列。合并部分描述于:How to split single row into multiple rows in Spark DataFrame using Java 。那么,如何使用java进行分割部分呢?对于scala,解释如下:Spark Scala: How to convert Dataframe[vector] to DataFrame[f1:Double, ..., fn: Double)] ,但我无法找到一种方法来按照链接中给出的方式在 java 中进行操作。

数据框的架构如下,IndexedUserID 的值将被纳入新创建的推荐列中:

root
|-- IndexedArtistID: integer (nullable = false)
|-- recommendations: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- IndexedUserID: integer (nullable = true)
| | |-- rating: float (nullable = true)

最佳答案

我尝试找到这个问题的解决方案,我必须说,对于人们在 python 和 scala for Spark 中遇到的问题,有很多内容可用,但在 java 中可用的内容却很少。因此,解决方案如下:

List<ElementStruct> structElements = dataFrameWithFeatures.javaRDD().map(row -> {
int artistId = row.getInt(0);
List<Object> recommendations = row.getList(1);
return new ElementStruct(artistId, recommendations);
}).collect();

List<Recommendation> recommendations = new ArrayList<>();
for (ElementStruct element : structElements) {
List<Object> features = element.getFeatures();
int artistId = element.getArtistId();
for (int i = 0; i < features.size(); i++) {
Object o = ((GenericRowWithSchema) features.get(i)).get(0);
recommendations.add(new Recommendation(artistId, (int) o));
}
}
SparkSession sparkSession = SessionCreator.getOrCreateSparkSession();
Dataset<Row> decomposedDataframe = sparkSession.createDataFrame(recommendations, Recommendation.class);

ElementStruct 类

import java.io.Serializable;
import java.util.List;

public class ElementStruct implements Serializable {
private int artistId;
private List<Object> features;

public ElementStruct(int artistId, List<Object> features) {
this.artistId = artistId;
this.features = features;
}

public int getArtistId() {
return artistId;
}

public void setArtistId(int artistId) {
this.artistId = artistId;
}

public List<Object> getFeatures() {
return features;
}

public void setFeatures(List<Object> features) {
this.features = features;
}
}

推荐类

import java.io.Serializable;

public class Recommendation implements Serializable {
private int artistId;
private int userId;

public Recommendation(int artistId, int userId){
this.artistId = artistId;
this.userId = userId;
}

public int getArtistId() {
return artistId;
}

public void setArtistId(int artistId) {
this.artistId = artistId;
}

public int getUserId() {
return userId;
}

public void setUserId(int userId) {
this.userId = userId;
}
}

说明:1. 对于数据框中的每一行,以列表形式获取艺术家和特征,以便于进一步处理。将这些艺术家和功能列表存储为 java 对象(在本例中为 Element 结构)。

  • 对于功能列表中的每个艺术家和元素,创建一个新的对象列表(在本例中为推荐)并将每个对象存储在该列表中。

  • 最后,根据第二步中获得的对象列表创建一个数据框。

  • 结果:

    root
    |-- artistId: integer (nullable = false)
    |-- userId: integer (nullable = false)

    +---------------+----------------+
    | artistId| userId|
    +---------------+----------------+
    | 1580|919 |
    | 1580|0.00249262 |
    | 4900|41749 |
    | 4900|7.143963 |
    | 5300|0 |
    | 5300|2.0147272E-4 |
    | 6620|208780 |
    | 6620|9.81092 |
    +---------------+----------------+

    关于java - 如何在Java中分解特征向量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59755022/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com