apache-spark - ALS 模型 - 如何生成 full

apache-spark - ALS 模型 - 如何生成 full_u * v^t * v？

转载作者：行者123 更新时间：2023-12-04 04:57:43

24

4

我试图弄清楚 ALS 模型如何预测新用户在批处理更新之间的值。在我的搜索中，我遇到了这个 stackoverflow answer 。为方便读者，我复制了以下答案:

You can get predictions for new users using the trained model (without updating it):

To get predictions for a user in the model, you use its latent representation (vector u of size f (number of factors)), which is multiplied by the product latent factor matrix (matrix made of the latent representations of all products, a bunch of vectors of size f) and gives you a score for each product. For new users, the problem is that you don't have access to their latent representation (you only have the full representation of size M (number of different products), but what you can do is use a similarity function to compute a similar latent representation for this new user by multiplying it by the transpose of the product matrix.

i.e. if you user latent matrix is u and your product latent matrix is v, for user i in the model, you get scores by doing: u_i * v for a new user, you don't have a latent representation, so take the full representation full_u and do: full_u * v^t * v This will approximate the latent factors for the new users and should give reasonable recommendations (if the model already gives reasonable recommendations for existing users)

To answer the question of training, this allows you to compute predictions for new users without having to do the heavy computation of the model which you can now do only once in a while. So you have you batch processing at night and can still make prediction for new user during the day.

Note: MLLIB gives you access to the matrix u and v

上面引用的文字是一个很好的答案，但是，我很难理解如何以编程方式实现这个解决方案。例如，矩阵 u 和 v 可以通过以下方式获得:

# pyspark example

# ommitted for brevity ... loading movielens 1M ratings

model = ALS.train(ratings, rank, numIterations, lambdaParam)

matrix_u = model.userFeatures()

print(matrix_u.take(2)) # take a look at the dataset

这将返回:

[
  (2, array('d', [0.26341307163238525, 0.1650490164756775, 0.118405282497406, -0.5976635217666626, -0.3913084864616394, -0.1379186064004898, -0.3866392970085144, -0.1768060326576233, -0.38342711329460144, 0.48550787568092346, -0.18867433071136475, -0.02757863700389862, 0.1410026103258133, 0.11498363316059113, 0.03958914801478386, 0.034536730498075485, 0.08427099883556366, 0.46969038248062134, -0.8230801224708557, -0.15124185383319855, 0.2566414773464203, 0.04326820373535156, 0.19077207148075104, 0.025207923725247383, -0.02030213735997677, 0.1696728765964508, 0.5714617967605591, -0.03885050490498543, -0.09797532111406326, 0.29186877608299255, -0.12768596410751343, -0.1582849770784378, 0.01933656632900238, -0.09131495654582977, 0.26577943563461304, -0.4543033838272095, -0.11789630353450775, 0.05775507912039757, 0.2891307771205902, -0.2147761881351471, -0.011787488125264645, 0.49508437514305115, 0.5610293745994568, 0.228189617395401, 0.624510645866394, -0.009683617390692234, -0.050237834453582764, -0.07940001785755157, 0.4686132073402405, -0.02288617007434368])), 
  (4, array('d', [-0.001666820957325399, -0.12487432360649109, 0.1252429485321045, -0.794727087020874, -0.3804478347301483, -0.04577340930700302, -0.42346617579460144, -0.27448347210884094, -0.25846347212791443, 0.5107921957969666, 0.04229479655623436, -0.10212298482656479, -0.13407345116138458, -0.2059325873851776, 0.12777331471443176, -0.318756639957428, 0.129398375749588, 0.4351944327354431, -0.9031049013137817, -0.29211774468421936, -0.02933369390666485, 0.023618215695023537, 0.10542935132980347, -0.22032295167446136, -0.1861676126718521, 0.13154461979866028, 0.6130356192588806, -0.10089754313230515, 0.13624103367328644, 0.22037173807621002, -0.2966669499874115, -0.34058427810668945, 0.37738317251205444, -0.3755438029766083, -0.2408779263496399, -0.35355791449546814, 0.05752146989107132, -0.15478627383708954, 0.3418906629085541, -0.6939512491226196, 0.4279302656650543, 0.4875738322734833, 0.5659542083740234, 0.1479463279247284, 0.5280753970146179, -0.24357643723487854, 0.14329688251018524, -0.2137598991394043, 0.011986476369202137, -0.015219110995531082]))
]

我也可以做类似的事情来获得 v 矩阵:

matrix_v = model.productFeatures()

print(matrix_v.take(2)) # take a look at the dataset

这导致:

[
  (2, array('d', [0.019985994324088097, 0.0673416256904602, -0.05697149783372879, -0.5434763431549072, -0.40705952048301697, -0.18632276356220245, -0.30776089429855347, -0.13178342580795288, -0.27466219663619995, 0.4183739423751831, -0.24422742426395416, -0.24130797386169434, 0.24116989970207214, 0.06833088397979736, -0.01750543899834156, 0.03404173627495766, 0.04333991929888725, 0.3577033281326294, -0.7044714689254761, 0.1438472419977188, 0.06652364134788513, -0.029888223856687546, -0.16717877984046936, 0.1027146726846695, -0.12836599349975586, 0.10197233408689499, 0.5053384900093079, 0.019304445013403893, -0.21254844963550568, 0.2705852687358856, -0.04169371724128723, -0.24098040163516998, -0.0683765709400177, -0.09532768279314041, 0.1006036177277565, -0.08682398498058319, -0.13584329187870026, -0.001340558985248208, 0.20587041974067688, -0.14007550477981567, -0.1831497997045517, 0.5021498203277588, 0.3049483597278595, 0.11236990243196487, 0.15783801674842834, -0.044139936566352844, -0.14372406899929047, 0.058535050600767136, 0.3777201473712921, -0.045475270599126816])), 
  (4, array('d', [0.10334215313196182, 0.1881643384695053, 0.09297363460063934, -0.457258403301239, -0.5272660255432129, -0.0989445373415947, -0.2053477019071579, -0.1644461452960968, -0.3771175146102905, 0.21405018866062164, -0.18553146719932556, 0.011830524541437626, 0.29562288522720337, 0.07959598302841187, -0.035378433763980865, -0.11786794662475586, -0.11603366583585739, 0.3776192367076874, -0.5124108791351318, 0.03971947357058525, -0.03365595266222954, 0.023278912529349327, 0.17436474561691284, -0.06317273527383804, 0.05118614062666893, 0.4375131130218506, 0.3281322419643402, 0.036590900272130966, -0.3759073317050934, 0.22429685294628143, -0.0728025734424591, -0.10945595055818558, 0.0728464275598526, 0.014129920862615108, -0.10701996833086014, -0.2496117204427719, -0.09409723430871964, -0.11898282915353775, 0.18940524756908417, -0.3211393356323242, -0.035668935626745224, 0.41765937209129333, 0.2636736035346985, -0.01290816068649292, 0.2824321389198303, 0.021533429622650146, -0.08053319901227951, 0.11117415875196457, 0.22975310683250427, 0.06993964314460754]))
]

但是，我不确定如何从这个进展到 full_u * v^t * v

最佳答案

这个新用户不是矩阵 U，所以你没有它在 'k' 个因子中的潜在表示，你只知道它的完整表示，即它的所有评分。 full_u 此处表示 中所有新用户的评分密集格式 (不是稀疏格式 ratings 是)例如:
[0 2 0 0 0 1 0]如果用户 u 对项目 2 的评分为 2，对项目 6 的评分为 1。

那么你可以得到v非常像您所做的并将其转换为 numpy 中的矩阵，例如:

pf = model.productFeatures()
Vt = np.matrix(np.asarray(pf.values().collect()))

那么这只是乘法的问题: full_u*Vt*Vt.T Vt和 V与其他答案相比被转置，但这只是一个约定问题。

请注意 Vt*Vt.T产品是固定的，所以如果你要为多个新用户使用它，预先计算它会在计算上更有效。实际上，对于不止一个用户，最好将他们的所有评分都放在 bigU(与我的一个新用户示例格式相同)中，然后执行矩阵乘积: bigU*Vt*Vt.T获取所有新用户的所有评分。可能仍然值得检查产品在操作次数方面是否以最有效的方式完成。

关于apache-spark - ALS 模型 - 如何生成 full_u * v^t * v？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41537470/

24

4

0

文章推荐： backbone.js - Backbone 添加事件

文章推荐： ruby-on-rails - 在after_save回调中返回false并回滚

文章推荐：南希- super 简单 View 引擎: Nested @Each

文章推荐： asp.net - 缩短了aspx中的命名空间别名

c++ - XOR AL,AL + MOVZX EAX, AL 比 XOR EAX,EAX 有什么优势？
我有一些未知的 C++ 代码是在发布版本中编译的，因此对其进行了优化。我正在努力解决的问题是: xor al, al add esp, 8 cmp byte ptr [ebp+
assembly - x86 程序集中的 `or al, al`
print_string: lodsb ; grab a byte from SI cmp al, 0 ;or al, al ; logical or AL by i
将测试 al,al 转换为 c
如何用 C 语言编写 test al,al 代码？我试过 if((n & 0xFF) & 0){} 但这不正确。谢谢。最佳答案我猜你接下来要检查零标志，即 jz 或类似的。在那种情况下你会想要
linux - 为什么 `ls -al & ; ls -al`无效？
我在我的 Centos5 机器上运行了这个: ls -al & ; ls -al 我期待它在后台运行 ls -al，同时在前台运行 ls -al，并演示终端的输出是如何被破坏的通过这样做。但是，我得
java - ArrayList al = new ArrayList(); 和有什么区别ArrayList al = new ArrayList(0)？
ArrayList al = new ArrayList(); 和有什么区别？ ArrayList al = new ArrayList(0)？最佳答案 ArrayList(0) 具有指定初始容量的
java - 面向对象 : Difference between ArrayList al = new ArrayList() and List al = new ArrayList()?
这个问题在这里已经有了答案: 关闭 11 年前。 Possible Duplicate: List versus ArrayList 之间的区别 ArrayList al = new ArrayLi
linux - 对 X86_64 linux : Why should we write mov [digit], al 的程序集中标签的使用感到困惑，但不是 mov digit, al？
这是我的代码: section .data digit db 0,10 section .text global _start _start: call _printRAXD
c++ - 在带有 XCode 4.1 的 Mac OS X Lion 上找不到 OpenAL 的 AL/al.h
我已经在 XCode 中创建了一个项目并添加了 OpenAL 框架。当我尝试包含 AL/al.h 时，编译器仍然找不到它。我在哪里添加 OpenAL 的包含目录？编辑:抱歉我忘了补充:我正在使用
assembly - 如何提取位于 AL 中定义的索引位置的字节
问题陈述:需要从ymm0寄存器中提取位于其值在寄存器AL中的位置的字节。我的方法:(相当难看): ; Set XMM1 to be a "shift one byte by righ
.net - 在Powershell中交互使用Mutexes(et al)
在调试使用信号量进行跨进程同步的应用程序时，我偶然发现了使用PowerShell代替“其他”进程的想法。在PowerShell中执行以下操作可以正常工作: // In C# application:
apache-spark - ALS 推荐的笛卡尔积错误
我正在尝试为用户显示电影推荐列表。模型已经过训练，但在尝试显示预测时出现错误。 als = ALS(maxIter=5, regParam=0.01, userCol="userID", itemCo
azure - Spark ALS 隐式异常
我们在 Azure Spark 上使用 ALS 来构建我们的推荐系统。由于计算能力的原因，我们无法为每个用户输出不同的推荐列表。因此，我们将用户分为聚类，并使用 ALS 为每个单独的聚类质心输出推荐
assembly - "AND AL,0xFF"的目的是什么？
我正在阅读一个反汇编的 win32 c++ 程序，我看到了很多: AND AL,0xFF 这是完全没有意义的还是为什么编译器会生成这些？这是一个更长的例子: movsx eax, byte pt
apache-spark - ALS 是确定性的吗？
我对用于推荐引擎的 ALS 有疑问？ ALS 是确定性的吗？比如，如果你输入相同的数据和相同的参数，你是否应该总是得到相同的输出(或非常相似的结果)？最佳答案简短的回答应该是:NO。矩阵分解算法的
c - al 发送假键盘事件、空白问题
我正在使用 gtk 和 xlib(xtst) 创建一个程序来将假按键发送到应用程序，我创建了这个循环来将按键发送到事件窗口: Display *dis; dis = XOpenDisp
scala - 如何在协同过滤中设置 ALS 隐式反馈的首选项？
我正在尝试使用带有隐式反馈的 Spark MLib ALS 进行协作过滤。输入数据只有两个字段userId和productId。我没有产品评分，只有用户购买过哪些产品的信息，仅此而已。因此，为了训练
linux - 我如何找到这个 al 寄存器中的值？
我正在做一个处理 x86 汇编语言的二进制炸弹实验室作业(我相信是 at&t)。我正在运行我的代码并走到最后，但最后我将我的 306 值与这个 %al 寄存器进行比较，我不知道如何获得它的值，因为我尝
linux - ls -al 输出列字段
有人可以指导我在linux中ls -ali输出的每一列描述什么吗？最佳答案 ls -ali 类似这样的事情 67403780 -rw-------. 1 root root 1114 12月
c++ - 列出所有打开 AL 的设备不起作用
我找到了一些声称使用 openAL 列出所有音频输出设备的示例，但是我只能让它们列出当前在 OSX(Yosemite、Maverick)上选择的设备。我使用的是 mac，有默认声卡(内置输出)以及 a
Linux按日期排序 "ls -al"输出
我想根据日期对“ls -al”命令的输出进行排序。我可以使用命令轻松地为一列执行此操作: $ ls -al | sort -k6 -M -r 但是如何同时对第 6 列和第 7 列执行此操作？命令: $

首页

博学

6Ren·AI

商城

apache-spark - ALS 模型 - 如何生成 full_u * v^t * v？