Implementation of k-means clustering algorithm(K-均值聚类算法的实现)-6ren

Implementation of k-means clustering algorithm(K-均值聚类算法的实现)

转载作者：bug小助手更新时间：2023-10-26 20:55:38

In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters.
I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting into infinite loop.
can anyone please guide me where i'm making a mistake..?

在我的程序中，我将k=2作为k-均值算法，即我只想要2个簇。我已经以一种非常简单和直接的方式实现了，但我仍然不能理解为什么我的程序会进入无限循环。有没有人能告诉我哪里出了错？

for simplicity, i hav taken the input in the program code itself.
here is my code :

为简单起见，我采用了程序代码本身中的输入。以下是我的代码：

import java.io.*;
import java.lang.*;
class Kmean
{
public static void main(String args[])
{
int N=9;
int arr[]={2,4,10,12,3,20,30,11,25};    // initial data
int i,m1,m2,a,b,n=0;
boolean flag=true;
float sum1=0,sum2=0;
a=arr[0];b=arr[1];
m1=a; m2=b;
int cluster1[]=new int[9],cluster2[]=new int[9];
for(i=0;i<9;i++)
    System.out.print(arr[i]+ "\t");
System.out.println();

do
{
 n++;
 int k=0,j=0;
 for(i=0;i<9;i++)
 {
    if(Math.abs(arr[i]-m1)<=Math.abs(arr[i]-m2))
    {   cluster1[k]=arr[i];
        k++;
    }
    else
    {   cluster2[j]=arr[i];
        j++;
    }
 }
    System.out.println();
    for(i=0;i<9;i++)
        sum1=sum1+cluster1[i];
    for(i=0;i<9;i++)
        sum2=sum1+cluster2[i];
    a=m1;
    b=m2;
    m1=Math.round(sum1/k);
    m2=Math.round(sum2/j);
    if(m1==a && m2==b)
        flag=false;
    else
        flag=true;

    System.out.println("After iteration "+ n +" , cluster 1 :\n");    //printing the clusters of each iteration
    for(i=0;i<9;i++)
        System.out.print(cluster1[i]+ "\t");

    System.out.println("\n");
    System.out.println("After iteration "+ n +" , cluster 2 :\n");
    for(i=0;i<9;i++)
        System.out.print(cluster2[i]+ "\t");

}while(flag);

    System.out.println("Final cluster 1 :\n");            // final clusters
    for(i=0;i<9;i++)
        System.out.print(cluster1[i]+ "\t");

    System.out.println();
    System.out.println("Final cluster 2 :\n");
    for(i=0;i<9;i++)
        System.out.print(cluster2[i]+ "\t");
 }
}

更多回答

When calculating sum1 and sum2, you should loop until k and j, respectively (instead of 9). The value of the rest of those arrays (cluster1 and cluster2) is undefined.

在计算sum 1和sum 2时，应该分别循环到k和j(而不是9)。其余数组(cluster1和cluster2)的值未定义。

优秀答案推荐

You have a bunch of errors:

您有一大堆错误：

At the start of your do loop you should reset sum1 and sum2 to 0.

You should loop until k and j respectively when calculating sum1 and sum2 (or clear cluster1 and cluster2 at the start of your do loop.

In the calculation of sum2 you accidentally use sum1.

When I make those fixes the code runs fine, yielding the output:

当我进行这些修复时，代码运行得很好，产生了输出：

Final cluster 1 :   
2   4   10   12  3   11  0   0   0

Final cluster 2 :
20  30  25   0   0   0   0   0   0

My general advise: learn how to use a debugger. Stackoverflow is not meant for questions like this: it is expected that you can find your own bugs and only come here when everything else fails...

我的一般建议是：学习如何使用调试器。Stackoverflow并不意味着这样的问题：它期望你能找到自己的bug，只有当其他一切都失败时才来这里。

public class KMeansClustering {

public static void main(String args[]) {
    int arr[] = {2, 4, 10, 12, 3, 20, 30, 11, 25};    // initial data
    int i, m1, m2, a, b, n = 0;
    boolean flag;
    float sum1, sum2;
    a = arr[0];
    b = arr[1];
    m1 = a;
    m2 = b;
    int cluster1[] = new int[arr.length], cluster2[] = new int[arr.length];
    do {
        sum1 = 0;
        sum2 = 0;
        cluster1 = new int[arr.length];
        cluster2 = new int[arr.length];
        n++;
        int k = 0, j = 0;
        for (i = 0; i < arr.length; i++) {
            if (Math.abs(arr[i] - m1) <= Math.abs(arr[i] - m2)) {
                cluster1[k] = arr[i];
                k++;
            } else {
                cluster2[j] = arr[i];
                j++;
            }
        }
        System.out.println();
        for (i = 0; i < k; i++) {
            sum1 = sum1 + cluster1[i];
        }
        for (i = 0; i < j; i++) {
            sum2 = sum2 + cluster2[i];
        }
        //printing Centroids/Means\
        System.out.println("m1=" + m1 + "   m2=" + m2);
        a = m1;
        b = m2;
        m1 = Math.round(sum1 / k);
        m2 = Math.round(sum2 / j);
        flag = !(m1 == a && m2 == b);

        System.out.println("After iteration " + n + " , cluster 1 :\n");    //printing the clusters of each iteration
        for (i = 0; i < cluster1.length; i++) {
            System.out.print(cluster1[i] + "\t");
        }

        System.out.println("\n");
        System.out.println("After iteration " + n + " , cluster 2 :\n");
        for (i = 0; i < cluster2.length; i++) {
            System.out.print(cluster2[i] + "\t");
        }

    } while (flag);

    System.out.println("Final cluster 1 :\n");            // final clusters
    for (i = 0; i < cluster1.length; i++) {
        System.out.print(cluster1[i] + "\t");
    }

    System.out.println();
    System.out.println("Final cluster 2 :\n");
    for (i = 0; i < cluster2.length; i++) {
        System.out.print(cluster2[i] + "\t");
    }
}

}

This is working code.

这是工作代码。

The only possible infinite loop is the do-while.

唯一可能的无限循环是do-While。

if(m1==a && m2==b)
    flag=false;
else
    flag=true;

You only exit the loop if flag is true. Breakpoint the if statement here and have a look to see why it is never getting set to false. Maybe add some debug print statements as well.

只有在标志为真的情况下才能退出循环。将if语句设置为此处的断点，并查看为什么它从未被设置为False。也可以添加一些调试打印语句。

package k;

/**
 *
 * @author Anooj.k.varghese
 */

import java.io.FileNotFoundException;
import java.io.File;
import java.util.Scanner;
public class K {


    /**
     * @param args the command line arguments
     */
    //GLOBAL VARIABLES
    //data_set[][] -------------datast is stored in the data_set[][] array
    //initial_centroid[][]------according to k'th value we select initaly k centroid.stored in the initial_centroid[][] 
    //                          value is assigned in the  'first_itration()' function
    private static double[][] arr;
    static int num = 0;
    static Double data_set[][]=new Double[20000][100];
    static Double diff[][]=new Double[20000][100];
    static Double intial_centroid[][]=new Double[300][400];
    static Double center_mean[][]=new Double[20000][100];
    static Double total_mean[]=new Double[200000];
    static int cnum;
    static int it=1;
    static int checker=1;
    static int row=4;//rows in Your DataSet here i use iris dataset 
     /////////////////////////////////reading the file/////////////////////////////////////
     // discriptin readFile readthe txt file
    private static void readFile() throws FileNotFoundException
        {
        Scanner scanner = new Scanner(new File("E:/aa.txt"));//Dataset path
        scanner.useDelimiter(System.getProperty("line.separator"));
        int lineNo = 0;
            while (scanner.hasNext())
             {
                parseLine(scanner.next(),lineNo);
                lineNo++;
                System.out.println();
             }
             // System.out.println("total"+num); PRINT THE TOTAL
     scanner.close();
        }
    //read file is copey to the data_set
    public static void parseLine(String line,int lineNo)
      { 
        Scanner lineScanner = new Scanner(line);
        lineScanner.useDelimiter(",");
          for(int col=0;col<row;col++)
              {
                  Double arry=lineScanner.nextDouble();
                  data_set[num][col]=arry;                          ///here read  data set is assign the variable data_set
               }
         num++;

        }
      public static void first_itration()
    {   double a = 0;
         System.out.println("ENTER CLUSTER NUMBER");
         Scanner sc=new Scanner(System.in);      
         cnum=sc.nextInt();   //enter the number of cenroid

         int result[]=new int[cnum];
        double re=0;

         System.out.println("centroid");
         for(int i=0;i<cnum;i++)
         {
            for(int j=0;j<row;j++)
                {
                    intial_centroid[i][j]=data_set[i][j];                  //// CENTROID ARE STORED IN AN intial_centroid variable
                    System.out.print(intial_centroid[i][j]);      
                }
            System.out.println();
         }
       System.out.println("------------");

       int counter1=0;
       for(int i=0;i<num;i++)
       {
            for(int j=0;j<row;j++)
                {
                      //System.out.println("hii");
                 System.out.print(data_set[i][j]);

                 }
       counter1++;
       System.out.println();
       }
           System.out.println("total="+counter1);                             //print the total number of data
           //----------------------------------

           ///////////////////EUCLIDEAN DISTANCE////////////////////////////////////
                                                                                /// find the Euclidean Distance
        for(int i=0;i<num;i++)
        {
                for(int j=0;j<cnum;j++)       
                {
                    re=0;
                     for(int k=0;k<row;k++)
                     {
                            a= (intial_centroid[j][k]-data_set[i][k]);
                            //System.out.println(a);
                             a=a*a;
                             re=re+a;                                                 // store the row sum

                        }

                         diff[i][j]= Math.sqrt(re);// find the squre root

        }
        }
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

///////////////////////////////////////////////FIND THE SMALLEST VALUE////////////////////////////////////////////////
   double aaa;
   double counter;
     int ccc=1;
   for(int i=0;i<num;i++)
   {
         int c=1;
         counter=c;
         aaa=diff[i][0];
         for(int j=0;j<cnum;j++)
         {
          //System.out.println(diff[i][j]);

            if(aaa>=diff[i][j] )                                                //change
                {
                    aaa=diff[i][j];
                    counter=j;


                    // Jth value are stord in the counter variable 
               //   System.out.println(counter);
               }


         }

            data_set[i][row]=counter;                                        //assign the counter to last position of data set

            //System.out.println("--");
      }                                                                  //print the first itration
            System.out.println("**FIRST ITRATION**");

      for(int i=0;i<num;i++)
              {
                  for(int j=0;j<=row;j++)
                      {
                      //System.out.println("hii");
                              System.out.print(data_set[i][j]+ " ");
                       }
                  System.out.println();
              }

    it++;
    }


    public static void calck_mean()
    { 
        for(int i=0;i<20000;i++)
        {
            for(int j=0;j<100;j++)
            {
                center_mean[i][j]=0.0;
            }
        }


  double c = 0; 
     int a=0;
     int p;
     int abbb = 0;
        if(it%2==0)
         {
             abbb=row;
         }
        else if(it%2==1)
         {
             abbb=row+1;
          }
        for(int k=0;k<cnum;k++)
            {
                    double counter = 0;    
                    for(int i=0;i<num;i++)
                     {
                        for(int j=0;j<=row;j++)
                        {               
                            if(data_set[i][abbb]==a)
                            {
                            System.out.print(data_set[i][j]);
                            center_mean[k][j] += data_set[i][j];

                            }

                          }
                        System.out.println();
                      if(data_set[i][abbb]==a)
                        {
                            counter++;
                        }
                  System.out.println();
              }

         a++;
         total_mean[k]=counter;

         }
         for(int i=0;i<cnum;i++)
            {
            System.out.println("\n");
            for(int j=0;j<row;j++)
            {
              if(total_mean[i]==0)
              {
                   center_mean[i][j]=0.0;
              }
              else
              {
                center_mean[i][j]=center_mean[i][j]/total_mean[i];
              }
              }
        }
        for(int k=0;k<cnum;k++)
        {
            for(int j=0;j<row;j++)
            {
              //System.out.print(center_mean[k][j]);
            }
            System.out.println();

        }
       /* for(int j=0;j<cnum;j++)
        {
            System.out.println(total_mean[j]);
        }*/

    }
public static void kmeans1()
    {
       double  a = 0;
       int result[]=new int[cnum];
       double re=0;

  //// CENTROID ARE STORED IN AN data_set VARIABLE intial_centroid 
         System.out.println(" new centroid");
            for(int i=0;i<cnum;i++)
            {
                for(int j=0;j<row;j++)
                {
                    intial_centroid[i][j]=center_mean[i][j];
                    System.out.print(intial_centroid[i][j]);
                }
             System.out.println();
            }

   //----------------------------------------------JUST PRINT THE data_set

           //----------------------------------
        for(int i=0;i<num;i++)
        {
            for(int j=0;j<cnum;j++)
            {
             re=0;
             for(int k=0;k<row;k++)
             {

               a=(intial_centroid[j][k]-data_set[i][k]);
                 //System.out.println(a);
                a=a*a;        
               re=re+a;

                }

             diff[i][j]= Math.sqrt(re);
             //System.out.println(diff[i][j]);
            }
        }
   double aaa;
    double counter;
     for(int i=0;i<num;i++)
     {

         int c=1;
         counter=c;
          aaa=diff[i][0];
         for(int j=0;j<cnum;j++)
         {
            // System.out.println(diff[i][j]);
            if(aaa>=diff[i][j])                                                  //change
            {
               aaa=diff[i][j];
                counter=j;
               //   System.out.println(counter);
            }


         }


         if(it%2==0)
            {
        // abbb=4;
                data_set[i][row+1]=counter;
            }
         else if(it%2==1)
            {
                data_set[i][row]=counter;
      //   abbb=4;
            }


        //System.out.println("--");
     }
     System.out.println(it+" ITRATION**");

      for(int i=0;i<num;i++)
              {
                  for(int j=0;j<=row+1;j++)
                  {
                      //System.out.println("hii");
                      System.out.print(data_set[i][j]+" ");
                  }
                  System.out.println();
              }

    it++;
    }
public static void check()
{
    checker=0;
    for(int i=0;i<num;i++)
    {
         //System.out.println("hii");
        if(Double.compare(data_set[i][row],data_set[i][row+1]) != 0)
        {
            checker=1;
            //System.out.println("hii " + i  + " " + data_set[i][4]+ " "+data_set[i][4]);
            break;
        }
        System.out.println();
    }

}
public static void dispaly()
{

      System.out.println(it+" ITRATION**");

      for(int i=0;i<num;i++)
              {
                  for(int j=0;j<=row+1;j++)
                  {
                      //System.out.println("hii");
                      System.out.print(data_set[i][j]+" ");
                  }
                  System.out.println();
              }
}


 public static void print()
    {
        System.out.println();
         System.out.println();
          System.out.println();
        System.out.println("----OUTPUT----");
        int c=0;
        int a=0;
        for(int i=0;i<cnum;i++)
        {
            System.out.println("---------CLUSTER-"+i+"-----");
         a=0;
            for(int j=0;j<num;j++)
            {
                 if(data_set[j][row]==i)
                 {a++;
                for(int k=0;k<row;k++)
                {

                    System.out.print(data_set[j][k]+"  ");
                }
                c++;
                System.out.println();
                }
                 //System.out.println(num);

            }
               System.out.println("CLUSTER INSTANCES="+a);


        }
        System.out.println("TOTAL INSTANCE"+c);
    }


    public static void main(String[] args) throws FileNotFoundException 
    {
    readFile();
    first_itration();

    while(checker!=0)
            {
            calck_mean();
            kmeans1();
            check();
            } 
  dispaly();
  print();
    }




}


    ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Take a look at SMILE library at https://haifengl.github.io/clustering.html
It has a lot of ready made opensource clustering algorithm implementation in Java

请看一下https://haifengl.github.io/clustering.html上的SMILE库，它有很多现成的Java开源集群算法实现

更多回答

I sense the rounding could pose a problem. Math#round does math rounding. Maybe floor/ceil would be more appropriate.

我感觉四舍五入可能会带来问题。数学#轮进行数学四舍五入。也许地板/天花板会更合适。

Link only answers are considered very low quality and can get deleted, please put the important parts from the linked resource into the answer body.

只有链接的答案被认为质量很低，可以删除，请将链接资源中的重要部分放入答案正文中。

c++ - 当声明 "Implementation"时，什么样的软件是 "Implementation-defined"的一部分？ "Implementation"到底是什么？
我经常在 C 标准文档中看到“实现定义”的说法，并且非常将其作为答案。然后我在 C99 标准中搜索它，并且: ISO/IEC 9899/1999 (C99) 中第 §3.12 条规定: 3.12 I
c - "implementation"中的 "implementation (in)dependent"是什么意思？
“依赖于实现”中的“实现”是什么意思？ “依赖于实现”和“依赖于机器”之间有什么区别？我使用C，所以你可以用C解释它。最佳答案当 C 标准讨论实现时，它指的是 C 语言的实现。因此，C 的实现就
android - 不支持的操作异常 : isDirectory is not implemented: isDirectory is not implemented
我刚刚在 Android-studio 中导入了我的项目，并试图在其中创建一个新的 Activity。但我无法在 android-studio 中创建 Activity 。我指的是here我看不到将目
hadoop - 不支持的操作异常 : Not implemented by the KosmosFileSystem FileSystem implementation
我想知道您对为什么会发生此错误的意见。在陆上生产环境中，我们使用 CDH4。在我们的本地测试环境中，我们只使用 Apache Hadoop v2.2.0。当我运行在 CDH4 上编译的同一个 jar
java.lang.UnsatisfiedLinkError : No implementation found (When Implement a SDK)
我正在尝试集成第三方 SDK (DeepAR)。但是当我构建它时，它会显示一个错误。我试图修复它。如果我创建一个简单的新项目，它就可以正常工作。但是我现有的应用程序我使用相机和 ndk。请帮我找出错误
java - 如果我们有 @Override 为什么没有 @Implementation 或 @Implements`？
我很好奇为什么我们有 @Overrides 注释，但接口(interface)没有类似的习惯用法(例如 @Implements 或 @Implementation)。这似乎是一个有用的功能，因为您可能
java - 道: diffrence between InMemory implementation and Database Implementation
我对 DAODatabase(适用于 Oracle 11 xe)的 CRUD 方法的实现感到困惑。问题是，在通常存储到 Map 集合的情况下，“U”方法(更新)会插入新元素或更新它(像 ID:Abst
java - Java API中 "implements"和 "All Implemented Interfaces"之间的区别
Java-API 告诉我特定类实现了哪些接口(interface)。但有两种不同类型的信息，我不太确定这意味着什么。例如，对于“TreeSet”类:https://docs.oracle.com/en
java - 类设计:class implementing an Interface implementing another interface
我有一个接口(interface) MLService，它具有与机器学习算法的训练和交叉验证相关的基本方法，我必须添加两个接口(interface)分类和预测，它们将实现 MLService 并包含根
java - equals() 的实现 : compare against implemented interface or implementing class?
我一直想知道如何最好地为所有实现相同接口(interface)的类系列实现 equals()(并且客户端应该只使用所述接口(interface)并且永远不知道实现类)。我还没有编写自己的具体示例，但
Java 接口(interface) : Use default method implementation in implementation classes
我有一个接口(interface)及其 2 个或更多实现， public interface IProcessor { default void method1() { //logic
机器人 : how does the OS chose a component implementation when multiple implementation respond to the same intent?
我有同一个应用程序的免费版和高级版(几乎相同的代码，相同的类，到处都是“if”， list 中的不同包， list 中的进程名称相同)。主要 Activity 使用 IMPLICIT Intent 调
ios - 错误 : implementing a method which will also be implemented by its primary class
这是我为我的应用程序中的错误部分编写的代码 - (id)initWithData:(NSData *)data <-------- options:(NSUInteger)opti
java.lang.UnsupportedOperationException : Not implemented by the DistributedFileSystem FileSystem implementation during FileSystem. 获取()
请查找随附的代码片段。我正在使用此代码将文件从 hdfs 下载到我的本地文件系统 - Configuration conf = new Configuration(); FileSys
mongodb - 初始化应用程序时出错 : No datastore implementation specified Message: No datastore implementation specified
我想在 MongoDB 中使用 Grails2.5 中的“ElasticSearch”插件。我的“BuildConfig.groovy”文件是: grails.servlet.version = "3
ios - fatal error : init(coder:) has not been implemented error despite being implemented
我收到一条错误消息: fatal error: init(coder:) has not been implemented 对于我的自定义 UITableViewCell。该单元格未注册，在 Stor
android - kotlin.NotImplementedError: An operation is not implemented: not implemented Error from ImageButton Click
得到这个错误 kotlin.NotImplementedError: An operation is not implemented: not implemented 我正在实现一个 ImageBut
c - 为什么C : Array Implementation for Stack and Linked List Implementation for Stack?中的这两个代码之间会出现差异
typedef int Element; typedef struct { Element *stack; int max_size; int top; } Stack; //
javascript - TS : missing typings for optional members in implementation of abstract class that implements interface
Playground 代码 here 例子: interface IFoo { bar: number; foo?: () => void; } abstract class Abst
objective-c - 抑制警告 "Category is implementing a method which will also be implemented by its primary class"
我想知道如何抑制警告: Category is implementing a method which will also be implemented by its primary class. 我

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Implementation of k-means clustering algorithm(K-均值聚类算法的实现)