python - 如何使用 PyTorch 对数组中的数字执行无监督聚类-6ren

python - 如何使用 PyTorch 对数组中的数字执行无监督聚类

转载作者：行者123 更新时间：2023-12-03 18:54:19

24

4

我得到了这个数组，我想将这些数字聚类/分组为相似的值。
输入数组的一个例子:

array([ 57, 58, 59, 60, 61, 78, 79, 80, 81, 82, 83, 101, 102, 103, 104, 105, 106]

预期结果 :

array([57,58,59,60,61]), ([78,79,80,81,82,83]), ([101,102,103,104,105,106])

我尝试使用聚类，但如果我不知道要拆分多少个，我认为它不会起作用。

true = np.where(array>=1)
-> (array([ 57,  58,  59,  60,  61,  78,  79,  80,  81,  82,  83, 101, 102,
    103, 104, 105, 106], dtype=int64),)

最佳答案

动态分箱需要明确的标准并且不是一个容易自动化的问题，因为每个阵列可能需要一组不同的阈值来有效地分箱。
我想 Gaussian mixtures与 silhouette score criteria是你最好的选择。这是您要实现的目标的代码。轮廓分数可帮助您确定应使用的聚类/高斯分布的数量，并且对于 1D 数据非常准确且可解释。

import numpy as np
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score
import scipy
import matplotlib.pyplot as plt
%matplotlib inline

#Sample data
x = [57, 58, 59, 60, 61, 78, 79, 80, 81, 82, 83, 101, 102, 103, 104, 105, 106]

#Fit a model onto the data
data = np.array(x).reshape(-1,1)

#change value of clusters to check best silhoutte score
print('Silhoutte scores')
scores = []
for n in range(2,11):
    model = GaussianMixture(n).fit(data)
    preds = model.predict(data)
    score = silhouette_score(data, preds)
    scores.append(score)
    print(n,'->',score)

n_best = np.argmax(scores)+2 #because clusters start from 2

model = GaussianMixture(n_best).fit(data) #best model fit
    
#Get list of means and variances
mu = np.abs(model.means_.flatten())
sd = np.sqrt(np.abs(model.covariances_.flatten()))

#Plotting
extend_window = 50  #this is for zooming into or out of the graph, higher it is , more zoom out
x_values = np.arange(data.min()-extend_window, data.max()+extend_window, 0.1) #For plotting smooth graphs
plt.plot(data, np.zeros(data.shape), linestyle='None', markersize = 10.0, marker='o') #plot the data on x axis

#plot the different distributions (in this case 2 of them)
for i in range(num_components):
    y_values = scipy.stats.norm(mu[i], sd[i])
    plt.plot(x_values, y_values.pdf(x_values))

#split data by clusters
pred = model.predict(data)
output = np.split(x, np.sort(np.unique(pred, return_index=True)[1])[1:])
print(output)

Silhoutte scores
2 -> 0.699444729378163
3 -> 0.8962176943475543  #<--- selected as nbest
4 -> 0.7602523591781903
5 -> 0.5835620702692205
6 -> 0.5313888070615105
7 -> 0.4457049486461251
8 -> 0.4355742296918767
9 -> 0.13725490196078433
10 -> 0.2159663865546218

这将创建 3 个具有以下分布的高斯分布以将数据拆分为集群。

数组输出最终按相似值拆分

#output - 
[array([57, 58, 59, 60, 61]),
 array([78, 79, 80, 81, 82, 83]),
 array([101, 102, 103, 104, 105, 106])]

关于python - 如何使用 PyTorch 对数组中的数字执行无监督聚类，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66238110/

24

4

0

文章推荐： python - 更改标签大小 - 足球场

文章推荐： javascript - 删除()函数完成时如何执行类型()函数

文章推荐： c++ - 使用遗留头文件作为 c++20 模块

文章推荐： python - 使用一致的填充将数据帧行值连接成一个

java - JFrame 中的 JPanel 中的 JScrollPane 中的 JTextPane
我想做的是让 JTextPane 在 JPanel 中占用尽可能多的空间。对于我使用的 UpdateInfoPanel: public class UpdateInfoPanel extends JP
java - JFrame 中的 JPanel 中的 JTextArea 中的 JScrollPane 出现问题
我在 JPanel 中有一个 JTextArea，我想将其与 JScrollPane 一起使用。我正在使用 GridBagLayout。当我运行它时，框架似乎为 JScrollPane 腾出了空间，但
ios - iOs Xcode 中的 UIViewController 中的 UIView 中的 UITableView
我想在 xcode 中实现以下功能。我有一个 View Controller 。在这个 UIViewController 中，我有一个 UITabBar。它们下面是一个 UIView。将 UITab
sql - 与 SQL 中的 STUFF 等效的函数(MySQL 中的 GROUP_CONCAT/Oracle 中的 LISTAGG)
有谁知道Firebird 2.5有没有类似于SQL中“STUFF”函数的功能？我有一个包含父用户记录的表，另一个表包含与父相关的子用户记录。我希望能够提取用户拥有的“ROLES”的逗号分隔字符串，而
Mirth 中的 Json 解析或 Mirth 中的 Json 或 Mirth 中的 HL7 到 JSON
我想使用 JSON 作为 mirth channel 的输入和输出，例如详细信息保存在数据库中或创建 HL7 消息。简而言之，输入为 JSON 解析它并输出为任何格式。最佳答案 var objec
python - 如果文件 1 中的 A 列 = 文件 2 中的 A 列，则替换为文件 2 中的 B 列
通常我会使用 R 并执行 merge.by，但这个文件似乎太大了，部门中的任何一台计算机都无法处理它! (任何从事遗传学工作的人的附加信息)本质上，插补似乎删除了 snp ID 的 rs 数字，我只剩
Javascript 中的 HAML 中的 Javascript
我有一个以前可能被问过的问题，但我很难找到正确的描述。我希望有人能帮助我。在下面的代码中，我设置了varprice，我想添加javascript变量accu_id以通过rails在我的数据库中查找记
HTML 中的 SVG 中的 HTML
我有一个简单的 SVG 文件，在 Firefox 中可以正常查看 - 它的一些包装文本使用 foreignObject 包含一些 HTML - 文本包装在 div 中:
ruby - Ruby 中的 If block 中的 "Or"
所以我正在为学校编写一个 Ruby 程序，如果某个值是 1 或 3，则将 bool 值更改为 true，如果是 0 或 2，则更改为 false。由于我有 Java 背景，所以我认为这段代码应该有效:
amazon-web-services - 如何从账户 A 中的 Lambda(VPC 中的 Lambda)调用账户 B(VPC 中的此 Lambda)中的 AWS Lambda 函数
我做了什么: 我在这些账户之间创建了 VPC 对等连接互联网网关也连接到每个 VPC 还配置了路由表(以允许来自双方的流量) 情况1: 当这两个 VPC 在同一个账户中时，我成功测试了从另一个 La
php - 如何获取 column1 中的 value1 和 column2 中的 value2 但 column1 中的 value2 在 column2 中没有 value1 的行？
我有一个名为 contacts 的表: user_id contact_id 10294 10295 10294 10293 10293 10294 102
php - Magento 中的 foreach 中的 getChildHtml
我正在使用 Magento 中的新模板。为避免重复代码，我想为每个产品预览使用相同的子模板。特别是我做了这样一个展示: $products = Mage::getModel('catalog/pro
protocols - Elixir 中的 "for"中的 "defimpl"实际上检查了什么？
“for”是否总是检查协议(protocol)中定义的每个函数中第一个参数的类型？编辑(改写): 当协议(protocol)方法只有一个参数时，根据该单个参数的类型(直接或任意)找到实现。当协议(p
javascript - PHP 中的 JavaScript 中的 PHP
我想从我的 PHP 代码中调用 JavaScript 函数。我通过使用以下方法实现了这一点: echo ' drawChart($id); '; 这工作正常，但我想从我的 PHP 代码中获取数据，我使
javascript - html 中的 html 中的 JavaScript
这个问题已经有答案了: Event binding on dynamically created elements? (23 个回答) 已关闭 5 年前。我有一个动态表单，我想在其中附加一些其他 h
javascript - componentDidMount() 中的 .map 中的 setState
我正在尝试找到一种解决方案，以在 componentDidMount 中的映射项上使用 setState。我正在使用 GraphQL连同 Gatsby返回许多 data 项目，但要求在特定的 pat
android - ScrollView 中的 View 中的 OnTouchListener
我在 ScrollView 中有一个 View 。只要用户按住该 View ，我想每 80 毫秒调用一次方法。这是我已经实现的: final Runnable vibrate = new Runnab
android - GetStringUTFChars 中的 dvmDecodeIndirectRef 中的 dvmAbort
我用 jni 开发了一个 android 应用程序。我在 GetStringUTFChars 的 dvmDecodeIndirectRef 中得到了一个 dvmabort。我只中止了一次。为什么会这
android - Activity 中的 FragmentPagerAdapter 中的 RecyclerView
当我到达我的 Activity 时，我调用 FragmentPagerAdapter 来处理我的不同选项卡。在我的一个选项卡中，我想显示一个 RecyclerView，但他从未出现过，有了断点，我看到
android - Activity 中的 DialogFragment 中的 RecyclerView
当我按下 Activity 中的按钮时，会弹出一个 DialogFragment。在对话框 fragment 中，有一个看起来像普通 ListView 的 RecyclerView。我想要的行为是当

首页

博学

6Ren·AI

商城

python - 如何使用 PyTorch 对数组中的数字执行无监督聚类