python - 产品比价工具 : Difficulty in matching identical items-6ren

python - 产品比价工具 : Difficulty in matching identical items

转载作者：行者123 更新时间：2023-11-30 09:53:25

25

4

我正在努力创建一个电子商务产品价格比较工具(Python)，它有点类似于 camelcamelcamel.com ，既是为了乐趣，也是为了利益。当我想使用搜索词从各个网站收集的列表中匹配相同的项目时，我面临着困难。我使用余弦相似度并考虑使用Levenshtein的产品匹配算法，将各个项目的标题相互匹配以找到相同的项目。

例如，我有以下商品及其价格，

{
    product_0: {
        title: "Apple MacBook Air MMGF2HN/A 13.3-inch Laptop (Core i5/8GB/128GB/Mac OS X/Integrated Graphics)",
        price: "xxxx",
    },
    product_1: {
        title: "Apple MacBook Air MMGF2HN/A 13.3-inch Laptop (Core i5/8GB/128GB/Mac OS X/Integrated Graphics) cover",
        price: "xyzy"
    },
    product_2: {
        title: "Apple Macbook Air MMGF2HNA Notebook (Intel Core i5- 8GB RAM- 128GB SSD- 33.78 cm(13.3)- OS X El Capitan) (Silver)"
        price: "xxyy"
    },
    product_3: {
        title: "....",
        price: "...."
    },

    ...

    product_99: {
        // product title and price
    }

}

当我在上面的项目列表(数据)上使用余弦相似度时，值如下

cosine(product_0 * product_1) = 0.973328526785
cosine(product_0 * product_2) = 0.50251890763

但实际上 product_0 和 product_1 是两个不同的项目，但它们的余弦相似度值表明这些项目是相同的； product_0 和 product_2 来自同一实体，但它们的余弦值显示它们是两个不同的项目。

我一直在努力自己解决这个问题，我想我可以在 stackoverflow 中寻求一些建议/建议。我使用余弦相似度来匹配项目的相似度的方向正确吗？如果没有，请引导我走向正确的方向。

我的基本想法是对相同的商品进行价格比较，即对各种相似的产品进行语义分析。

感谢您的宝贵时间。

最佳答案

你可以训练word2vec在产品标题上。使用 the Python word2vec wrapper 时，生成的代码看起来像这样使用 Gensim's model.word2vec 时略有不同但相似:

indexes, metrics = model.cosine(normalized_phrase)
model.generate_response(indexes, metrics)

生成的响应将是按余弦相似度降序排序的标题向量。

关于python - 产品比价工具 : Difficulty in matching identical items，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40460401/

25

4

0

文章推荐： python - 使用 sklearn 中的拟合进行协方差估计时出错

文章推荐： machine-learning - 分类信息可以改善样本外类别的预测吗？

文章推荐： javascript - gulp sourcemap 已创建但无法正常工作

文章推荐： machine-learning - Caffe 损失层、均值和准确度

identity - POCO 具有新的 ASP.NET Identity 和 MVC 5.0 + 基于声明的 Identity
使用新版本的 VS 2013 RTM 和 asp.net mvc 5.0，我决定尝试一些东西... 不用说，发生了很多变化。例如，新的 ASP.NET Identity 取代了旧的 Membershi
asp.net-mvc - Asp.net Identity : User. Identity.GetUserId() 始终为 null，User.Identity.IsAuthenticated 始终为 false
请参阅下面的代码: var result = await SignInManager.PasswordSignInAsync(model.UserName, model.Password, model
asp.net-core - Microsoft.Extensions.Identity.Stores 和 Microsoft.Extensions.Identity.Core 和 Microsoft.AspNetCore.Identity 之间有什么区别？
我对 asp.net 核心标识中的三个包感到困惑。我不知道彼此之间有什么区别。还有哪些是我们应该使用的？我在 GitHub 上找到了这个链接，但我没有找到。 Difference between M
asp.net-identity - AspNet Identity 使用同一电子邮件多次外部登录
Visual Studio-为AspNet Identity 生成一堆代码，即LoginController 和ManageController。在 ManageController 中有以下代码:
ios - 'identity' 在 SwifUI 中是什么意思以及我们如何更改某些东西的 'identity'
我是 SwiftUI 的新手，在连续显示警报时遇到问题。 .alert(item:content:) 的描述修饰符在它的定义中写了这个: /// Presents an alert. ///
scala - Disjunction.fold(identity, identity) 有快捷方式吗？
我有一个 scalaz Disjunction，其类型与 Disjunction[String, String] 相同，我只想获取值，无论它是什么。因此，我使用了 myDisjunction.fold
c# - ASP.NET Identity 在每次请求时重新生成 Identity
我有一个 ASP.NET MVC 应用程序，我正在使用 ASP.NET Identity 2。我遇到了一个奇怪的问题。 ApplicationUser.GenerateUserIdentityAsyn
asp.net-identity - ASP.NET Identity 中的哪些代码设置了用户的安全标记？
安全戳是根据用户的用户名和密码生成的随机值。在一系列方法调用之后，我将安全标记的来源追溯到 SecurityStamp。 Microsoft.AspNet.Identity.EntityFramew
sql - Scope_Identity()、Identity()、@@Identity 和 Ident_Current() 之间有什么区别？
我知道 Scope_Identity()、Identity()、@@Identity 和 Ident_Current() 全部获取身份列的值，但我很想知道其中的区别。我遇到的部分争议是，应用于上述这
c# - Identity Server 3 + AspNet Identity 中基于角色的声明
我正在使用 ASP.NET 5 beta 8 和 Identity Server 3 以及 AspNet Identity 用户服务实现。默认情况下，AspNet Identity 提供名为 AspN
identity - 如何在 identity asp.net core 3 中上传个人资料图片并进行更新
我想在identity 用户中上传头像，并在账户管理中更新。如果有任何关于 asp.net core 的好例子的帖子，请给我链接。最佳答案我自己用 FileForm 方法完成的。首先，您必须在用户
asp.net-identity - ASP.NET 5 Identity - 为用户刷新角色
在 ASP.NET 5 中，假设我有以下 Controller : [Route("api/[controller]")] [Authorize(Roles = "Super")] public cl
thinktecture-ident-server - Thinktecture Identity Server v3 Google提供商
集成外部提供商(即Google与Thinktecture Identity Server v3)时出现问题。出现以下错误:“客户端应用程序未知或未获得授权。” 是否有人对此错误有任何想法。最佳答案
asp.net-identity - 播种 Identity 2.0 数据库
我有一个 ASP.NET MVC 5 项目( Razor 引擎)，它具有带有个人用户帐户的 Identity 2.0。我正在使用 Visual Studio Professional 2013 我还没
asp.net-identity - 本地登录后 User.Identity.Name 为空
我配置IdentityServer4使用 AspNet Identity (.net core 3.0) 以允许用户进行身份验证(登录名/密码)。我的第三个应用程序是 .net core 3.0 中
asp.net-identity - 为什么来自一个站点的 ASP.NET Identity 登录信息会与同一台机器上的不同网站共享？
我创建了一个全新的 Web 应用程序，比如“WebApplication1” - 身份验证设置为个人用户帐户的 WebForms。我不会在自动生成的代码模板中添加一行代码。我运行应用程序并注册用户“U
asp.net-identity - 如何本地化 ASP.NET Identity 用户名和密码错误消息？
是否可以为“系统”ASP.NET Identity v1 错误消息提供本地化字符串，例如“名称 XYZ 已被占用”或“用户名 XYZ 无效，可以只包含字母或数字”？最佳答案对于 ASP.NET C
identity - 查找 Windows Identity Foundation 的 STS 提供程序
我对 Windows Identity Foundation (WIF) 进行了非常简短的了解，在我看来，我的网站将接受来自其他网站的登录。例如任何拥有 Gmail 或 LiveID 帐户的人都可以在
wso2-identity-server - 使用 WSO2 Identity Server 管理外部自定义权限？
我需要向 IS 添加自定义权限和角色。此处提供用例 http://venurakahawala.blogspot.in/search/label/custom%20permissions .如何实现这
asp.net-identity - 如何使用 Identity Server 实现 SSO？
我有许多使用 .NET 成员身份和表单例份验证的旧版 .NET Framework Web 应用程序。他们每个人都有自己的登录页面，但都在同一个域中(例如.mycompany.com)，共享一个 AS

首页

博学

6Ren·AI

商城

python - 产品比价工具 : Difficulty in matching identical items