java - 用于字符串搜索的 KMP 算法？-6ren

java - 用于字符串搜索的 KMP 算法？

转载作者：塔克拉玛干更新时间：2023-11-03 04:04:58

我在网上发现了这个非常具有挑战性的编码问题，我想尝试一下。

一般的想法是给定字符串T和模式P，找到这个模式的出现，总结它的对应值并返回最大值和最小值。如果您想更详细地阅读问题，请参阅 this。

然而，下面是我提供的代码，它适用于一个简单的测试用例，但是当运行在多个复杂的测试用例上时它非常慢，我不确定我的代码在哪里需要优化。

任何人都可以帮助我在逻辑错误的地方。

public class DeterminingDNAHealth {


  private DeterminingDNAHealth() {
    /*
     * Fixme:
     *  Each DNA contains number of genes
     *   - some of them are beneficial and increase DNA's total health
     *   - Each Gene has a health value
     *   ======
     *   - Total health of DNA = sum of all health values of beneficial genes
     */
  }

  int checking(int start, int end, String pattern) {
    String[] genesChar = new String[] {
      "a",
      "b",
      "c",
      "aa",
      "d",
      "b"
    };
    String numbers = "123456";

    int total = 0;

    for (int i = start; i <= end; i++) {
      total += KMPAlgorithm.initiateAlgorithm(pattern, genesChar[i]) * (i + 1);
    }

    return total;
  }

  public static void main(String[] args) {

    String[] genesChar = new String[] {
      "a",
      "b",
      "c",
      "aa",
      "d",
      "b"
    };
    Gene[] genes = new Gene[genesChar.length];

    for (int i = 0; i < 6; i++) {
      genes[i] = new Gene(genesChar[i], i + 1);
    }

    String[] checking = "15caaab 04xyz 24bcdybc".split(" ");


    DeterminingDNAHealth DNA = new DeterminingDNAHealth();
    int i, mostHealthiest, mostUnhealthiest;

    mostHealthiest = Integer.MIN_VALUE;
    mostUnhealthiest = Integer.MAX_VALUE;

    for (i = 0; i < checking.length; i++) {
      int start = Character.getNumericValue(checking[i].charAt(0));
      int end = Character.getNumericValue(checking[i].charAt(1));
      String pattern = checking[i].substring(2, checking[i].length());

      int check = DNA.checking(start, end, pattern);

      if (check > mostHealthiest)
        mostHealthiest = check;
      else
      if (check < mostUnhealthiest)
        mostUnhealthiest = check;
    }

    System.out.println(mostHealthiest + " " + mostUnhealthiest);

    // DNA.checking(1,5, "caaab");
  }
}

KMP算法

public class KMPAlgorithm {

  KMPAlgorithm() {}


  public static int initiateAlgorithm(String text, String pattern) {

    // let us generate our LPC table from the pattern
    int[] partialMatchTable = partialMatchTable(pattern);

    int matchedOccurrences = 0;

    // initially we don't have anything matched, so 0
    int partialMatchLength = 0;

    // we then start to loop through the text, !note, not the pattern. The text that we are testing the pattern on
    for (int i = 0; i < text.length(); i++) {

      // if there is a mismatch and there's no previous match, then we've hit the base-case, hence break from while{...}
      while (partialMatchLength > 0 && text.charAt(i) != pattern.charAt(partialMatchLength)) {

        /*
         * otherwise, based on the number of chars matched, we decrement it by 1.
         * In fact, this is the unique part of this algorithm. It is this part that we plan to skip partialMatchLength
         * iterations. So if our partialMatchLength was 5, then we are going to skip (5 - 1) iteration.
         */
        partialMatchLength = partialMatchTable[partialMatchLength - 1];

      }


      // if however we have a char that matches the current text[i]
      if (text.charAt(i) == pattern.charAt(partialMatchLength)) {

        // then increment position, so hence we check the next char of the pattern against the next char in text
        partialMatchLength++;

        // we will know that we're at the end of the pattern matching, if the matched length is same as the pattern length
        if (partialMatchLength == pattern.length()) {
          // to get the starting index of the matched pattern in text, apply this formula (i - (partialMatchLength - 1))

          // this line increments when a match string occurs multiple times;
          matchedOccurrences++;

          // just before when we have a full matched pattern, we want to test for multiple occurrences, so we make
          // our match length incomplete, and let it run longer.
          partialMatchLength = partialMatchTable[partialMatchLength - 1];

        }
      }

    }

    return matchedOccurrences;


  }


  private static int[] partialMatchTable(String pattern) {
    /*
     * TODO
     *  Note:
     *  => Proper prefix: All the characters in a string, with one or more cut off the end.
     *  => proper suffix: All the characters in a string, with one or more cut off the beginning.
     *
     *  1.) Take the pattern and construct a partial match table
     *
     *  To construct partial match table {
     *      1. Loop through the String(pattern)
     *      2. Create a table of size String(pattern).length
     *      3. For each character c[i], get The length of the longest proper prefix in the (sub)pattern
     *         that matches a proper suffix in the same (sub)pattern
     *  }
     */

    // we will need two incremental variables
    int i, j;

    // an LSP table also known as “longest suffix-prefix”
    int[] LSP = new int[pattern.length()];


    // our initial case is that the first element is set to 0
    LSP[0] = 0;

    // loop through the pattern...
    for (i = 1; i < pattern.length(); i++) {

      // set our j as previous elements data (not the index)
      j = LSP[i - 1];


      // we will be comparing previous and current elements data. ei char
      char current = pattern.charAt(i), previous = pattern.charAt(j);

      // we will have a case when we're somewhere in loop and two chars will not match, and j is not in base case.
      while (j > 0 && current != previous)
        // we decrement our j
        j = LSP[j - 1];

      // simply put, if two characters are same, then we update our LSP to say that at that point, we hold the j's value
      if (current == previous)
        // increment our j
        j++;

      // update the table
      LSP[i] = j;


    }

    return LSP;

  }
}

来源代码归功于 Github

最佳答案

您可以尝试这个 KMP 实现。它是 O(m+n)，正如 KMP 的意图。它应该快得多:

private static int[] failureFunction(char[] pattern) {
    int m = pattern.length;

    int[] f = new int[pattern.length];
    f[0] = 0;

    int i = 1;
    int j = 0;

    while (i < m) {
        if (pattern[i] == pattern[j]) {
            f[i] = j + 1;
            i++;
            j++;
        } else if (j > 0) {
            j = f[j - 1];
        } else {
            f[i] = 0;
            i++;
        }
    }
    return f;
}

private static int kmpMatch(char[] text, char[] pattern) {
    int[] f = failureFunction(pattern);

    int m = pattern.length;
    int n = text.length;

    int i = 0;
    int j = 0;

    while (i < n) {
        if (pattern[j] == text[i]) {
            if (j == m - 1){
                return i - (m - 1);
            } else {
                i++;
                j++;
            }
        } else if (j > 0) {
            j = f[j - 1];
        } else {
            i++;
        }
    }
    return -1;
}

关于java - 用于字符串搜索的 KMP 算法？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46821536/

文章推荐： c# - 将两个整数映射到一个(有上限)

文章推荐： Java++运算符

文章推荐： java - 计算字母数字字符的出现次数并以图形方式打印它们

文章推荐： c++ - 沿光栅化圆弧遍历像素

算法~利用zset实现滑动窗口限流
滑动窗口限流滑动窗口限流是一种常用的限流算法，通过维护一个固定大小的窗口，在单位时间内允许通过的请求次数不超过设定的阈值。具体来说，滑动窗口限流算法通常包括以下几个步骤：初始化：设置窗口
【算法】表达式求值
表达式求值：一个只有+,-,*,/的表达式，没有括号一种神奇的做法：使用数组存储数字和运算符，先把优先级别高的乘法和除法计算出来，再计算加法和减法 int GetVal(string s){
【算法】前缀和
【算法】前缀和题目先来看一道题目：（前缀和模板题）已知一个数组A[]，现在想要求出其中一些数字的和。输入格式：先是整数N,M，表示一共有N个数字，有M组询问接下来有N个数，表示A[1]..
【算法】二叉树的各种遍历方式
1.前序遍历根-左-右的顺序遍历，可以使用递归 void preOrder(Node *u){ if(u==NULL)return; printf("%d ",u->val);
【算法】01背包
先看题目物品不能分隔，必须全部取走或者留下，因此称为01背包（只有不取和取两种状态）看第一个样例我们需要把4个物品装入一个容量为10的背包我们可以简化问题，从小到大入手分析 weightva
算法 - 矩阵中被另一种颜色包围的颜色
我最近在一次采访中遇到了这个问题: 给出以下矩阵: [[ R R R R R R], [ R B B B R R], [ B R R R B B], [ R B R R R R]] 找出是否有任
使用Outlook发送电子邮件的C++算法
我正在尝试通过 C++ 算法从我的 outlook 帐户发送一封电子邮件，该帐户已经打开并记录，但真的不知道从哪里开始(对于 outlook-c++ 集成)，谷歌也没有帮我这么多。任何提示将不胜感激。
容器上滑动窗口的C++算法
我发现自己像这样编写了一个手工制作的 while 循环: std::list foo; // In my case, map, but list is simpler auto currentPoin
检测正方形后运行命令的c++算法
我有用于检测正方形的 opencv 代码。现在我想在检测正方形后，代码运行另一个命令。代码如下: #include "cv.h" #include "cxcore.h" #include "high
二值图像的泛洪填充C++算法
我正在尝试模拟一个 matlab 函数“imfill”来填充二进制图像(1 和 0 的二维矩阵)。我想在矩阵中指定一个起点，并像 imfill 的 4 连接版本那样进行洪水填充。这是否已经存在于
算法递归公式
我正在阅读 Robert Sedgewick 的《C++ 算法》。 Basic recurrences section it was mentioned as 这种循环出现在循环输入以消除一个项目的递
算法 - 如何生成日期结构？
我正在思考如何在我的日历中生成代表任务的数据结构(仅供我个人使用)。我有来自 DBMS 的按日期排序的任务记录，如下所示: 买牛奶(18.1.2013) 任务日期 (2013-01-15) 任务标签(
算法:查找恰好出现两次的元素
输入一个未排序的整数数组A[1..n]只有 O(d) :(d int) 计算每个元素在单次迭代中出现在列表中的次数。 map 是balanced Binary Search Tree基于确保 O(nl
算法——基于寻找最大匹配数
我遇到了一个问题，但我仍然不知道如何解决。我想出了如何用蛮力的方式来做到这一点，但是当有成千上万的元素时它就不起作用了。 Problem: Say you are given the followin
算法 - 用于计算成对相互出现的次数
我有一个列表列表。 L1= [[...][...][.......].......]如果我在展平列表后获取所有元素并从中提取唯一值，那么我会得到一个列表 L2。我有另一个列表 L3，它是 L2 的某个
算法 - 在矩阵中求和
我们得到二维矩阵数组(假设长度为 i 和宽度为 j)和整数 k我们必须找到包含这个或更大总和的最小矩形的大小F.e k=7 4 1 1 1 1 1 4 4 Anwser是2，因为4+4=8 >= 7，
算法:根据周数获取下一年日期工作类次类型
我实行 3 类倒制，每周换类。顺序为早类 (m)、晚类 (n) 和下午类 (a)。我固定的订单，即它永远不会改变，即使那个星期不工作也是如此。我创建了一个函数来获取 ISO 周数。当我给它一个日期时
算法 - 找到满足输入元素任意组合的所有集合
假设我们有一个输入，它是一个元素列表: {a, b, c, d, e, f} 还有不同的集合，可能包含这些元素的任意组合，也可能包含不在输入列表中的其他元素: A:{e,f} B:{d,f,a} C:
算法:添加新元素时如何找到集合的子集？
我有一个子集算法，可以找到给定集合的所有子集。原始集合的问题在于它是一个不断增长的集合，如果向其中添加元素，我需要再次重新计算它的子集。有没有一种方法可以优化子集算法，该算法可以从最后一个计算点重新
算法:按预期频率将符号压缩成位串？
我有一个包含 100 万个符号及其预期频率的表格。我想通过为每个符号分配一个唯一(且前缀唯一)的可变长度位串来压缩这些符号的序列，然后将它们连接在一起以表示序列。我想分配这些位串，以使编码序列的预

塔克拉玛干

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 用于字符串搜索的 KMP 算法？