python - MiniBatchKMeans Python-6ren

python - MiniBatchKMeans Python

转载作者：行者123 更新时间：2023-11-30 09:01:26

25

4

我正在使用 scikitlearn 中的 MiniBatchKMeans() 函数。出色地，在其文档中有:

batch_size : int, optional, default: 100 Size of the mini batches.

init_size : int, optional, default: 3 * batch_size Number of samples to randomly sample for speeding up the initialization (sometimes at the expense of accuracy): the only algorithm is initialized by running a batch KMeans on a random subset of the data. This needs to be larger than n_clusters.

我不太理解它，因为小批量的最终尺寸似乎是 3*batch_size 而不是 batch_size 参数指定的尺寸。

我是不是误会了什么？如果是这样，有人可以解释这两个论点。我是对的，为什么会有这两个论点，因为它们似乎是多余的。

谢谢!!!

最佳答案

批量大小由batch_size、句点定义。此外，您可以定义 init_size 这是用于初始化流程的样本大小，默认它是 3*batch_size。你可以简单地设置bath_size=100和init_size=10，然后使用10个样本进行初始化(kmeans不是全局收敛的，网上有很多技术可以处理它)初始化阶段)，稍后在算法执行期间将使用 100 批。

关于python - MiniBatchKMeans Python，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33039884/

25

4

0

文章推荐： python - 用pickle加速sklearn/机器学习的分类任务？

文章推荐： machine-learning - 用 sklearn 对目标进行 PCA？

文章推荐： r - R中information.gain中公式的使用

python - MiniBatchKMeans.fit 和 MiniBatchKMeans.partial_fit 之间的区别
我对 sklearn.cluster.MiniBatchKMeans 感兴趣作为使用庞大数据集的一种方式。无论如何，我对 MiniBatchKMeans.partial_fit() 之间的区别有点困惑
python - MiniBatchKMeans Python
我正在使用 scikitlearn 中的 MiniBatchKMeans() 函数。出色地，在其文档中有: batch_size : int, optional, default: 100 Size
python - MiniBatchKMeans 参数
我正在尝试使用 Sklearn 的 Minibatch K-Means 对图像 block 进行聚类，以重现 this paper 的结果。以下是有关我的数据集的一些信息: 400,000 行 108
python - sklearn MiniBatchKMeans 中的弃用警告
vectors = model.syn0 n_clusters_kmeans = 20 # more for visualization 100 better for clustering min_k
python - scikitlearn - HashingVectorizer 之后 MiniBatchKMeans 聚类过程中出现内存错误
我的目标是从数百万行的数据集中执行文本聚类，其中每一行都是一串单词，与正确的单词不对应文档，而是“关键字”列表。这个想法是，每一行代表一个 Twitter 用户，其关键字列表取自他/她的推文，以下是行
python - MiniBatchKMeans 溢出错误 : cannot convert float infinity to integer?
我正在尝试根据使用 sklearn.cluster.MiniBatchKMeans 的轮廓分数找到正确的簇数 k。 from sklearn.cluster import MiniBatchKMean
python - scikit-learn:K-Means 和 MiniBatchKMeans 聚类算法的比较
我正在浏览 Clustering 上的 scikit-learn 用户指南| .他们有一个比较 K-Means and MiniBatchKMeans 的例子. 我对示例中的以下代码有点困惑: # W

首页

博学

6Ren·AI

商城

python - MiniBatchKMeans Python