gpt4 book ai didi

azure - Spark ALS 隐式异常

转载 作者:行者123 更新时间:2023-12-01 15:08:41 24 4
gpt4 key购买 nike

我们在 Azure Spark 上使用 ALS 来构建我们的推荐系统。

由于计算能力的原因,我们无法为每个用户输出不同的推荐列表。因此,我们将用户分为聚类,并使用 ALS 为每个单独的聚类质心输出推荐列表。

在对用户进行聚类之前,我们使用Spark上的standardscaler和normalizer对数据进行预处理,以获得更好的聚类结果。但是,在使用ALS.trainImplicit时,这会导致以下异常

15/11/16 15:43:11 INFO TaskSetManager: Lost task 30.0 in stage 15.0 (TID 197) on executor localhost: java.lang.AssertionError (assertion failed: lapack.dppsv returned 4.) [duplicate 9] Traceback (most recent call last): File "/home/rogeesjir_huasqngfda/woradofkapkspace/jigsusLaudfadfecher/scripts/RecommendationBackend/AzureSpark/src/collaborativeFiltering/spark_als.py", line 92, in main() File "/home/rogeesjir_huasqngfda/rogeesjir_huasqngfda/jigsusLaudfadfecher/scripts/RecommendationBackend/AzureSpark/src/collaborativeFiltering/spark_als.py", line 39, in main model = ALS.trainImplicit(ratings, rank, numIter, alpha=0.01) File "/home/jigsusLaudfadfecher/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/recommendation.py", line 147, in trainImplicit 15/11/16 15:43:11 INFO TaskSetManager: Lost task 25.0 in stage 15.0 (TID 192) on executor localhost: java.lang.AssertionError (assertion failed: lapack.dppsv returned 4.) [duplicate 10] iterations, lambda_, blocks, alpha, nonnegative, seed) File "/home/jigsusLaudfadfecher/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/common.py", line 120, in callMLlibFunc return callJavaFunc(sc, api, *args) File "/home/jigsusLaudfadfecher/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/common.py", line 113, in callJavaFunc return _java2py(sc, func(*args)) File "/home/jigsusLaudfadfecher/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in call File "/home/jigsusLaudfadfecher/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError15/11/16 15:43:11 INFO TaskSetManager: Lost task 16.0 in stage 15.0 (TID 183) on executor localhost: java.lang.AssertionError (assertion failed: lapack.dppsv returned 4.) [duplicate 11]

: An error occurred while calling o39.trainImplicitALSModel. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 15.0 failed 1 times, most recent failure: Lost task 8.0 in stage 15.0 (TID 175, localhost): java.lang.AssertionError: assertion failed: lapack.dppsv returned 4. at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.ml.recommendation.ALS$CholeskySolver.solve(ALS.scala:355) at org.apache.spark.ml.recommendation.ALS$$anonfun$org$apache$spark$ml$recommendation$ALS$$computeFactors$1.apply(ALS.scala:1131) at org.apache.spark.ml.recommendation.ALS$$anonfun$org$apache$spark$ml$recommendation$ALS$$computeFactors$1.apply(ALS.scala:1092) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$15.apply(PairRDDFunctions.scala:674) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$15.apply(PairRDDFunctions.scala:674) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:172) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:79) at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

当我们删除“正则化”组件(即不进行标准缩放器和标准化器)时,一切正常。顺便说一句,即使我们在 ALS 模型训练之前对数据进行正则化,ALS.train() 调用显式评级也可以正常工作。

有人遇到过这样的问题吗?我们还是新人,所以请帮忙!谢谢。

最佳答案

对于 future 的读者:

Several columns in the given dataset contain only zeros. In this case, the data matrix is no full rank. Therefore the Gramian matrix is singular and hence not invertible. The Cholesky decomposition will fail in this case. This will also happen if standard deviation of more than one columns is zero (even if the values are not zero). I think we should catch this error in the code and exit with a warning message. OR we can drop columns with zero variance, and continue with the algorithm.

取自 comment .

只要确保大多数评级不为零,它就会起作用。

关于azure - Spark ALS 隐式异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33800385/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com