Coders Packet

Knuth Morris Pratt Algorithm String Matching in Java

By Janhvi Singh

KMP algorithm is a linear-time string matching algorithm wherein we search a pattern /substring in the given text string and check whether it's present or not.

Expected Input and Output

Three cases for the KMP algorithm are as follows - 

Best Case -

The best-case occurs when the pattern's first character is not present in the text string at all. Let's see how.

For example, we have -

Text String - AABBCCDDEFFGOP

Pattern – KFF

 

According to the algorithm, we search for the pattern till the last of the string. But if the first character itself is not present in the string, then in the end it will skip the search for the rest of the characters ( In this example K will mismatch with G ( index 11), and the pattern will move to index 12 where the pattern will exceed the string length, therefore skipping the search for index 12 and 13)

Best Case Complexity (in this case) -

 O(n-m+1) 

= O(12) 

O(n-m+1) ≈ O(n) 

where,

n= length of the string. 

m=length of the pattern.

Worst Case -

The worst-case occurs when we have to check all the characters in pattern and string at least once.

The word case complexity of the KMP algorithm is O(m+n).

Problem Solution

1) We take a string and check whether the given pattern is present in the string or not.

2 )We make an integer array to know how many characters we should skip (to reduce the reoccurrence of steps).

3) When the pattern is found, this array integer tells us which character to check next and which characters to skip.

(detailed explanation in problem explanation section)

For example - 

Text String - AAABBBAAAD

Pattern - AAA

Array[] = [0,1,2]

Initially k=0, m=0 (k for string and m for pattern)

Whenever there is a match do k++, m++

When the pattern is found at index 2, we reset m. Now the value of arr[m-1] gives the index of the character to match next.

When there is a mismatch (and m>0) reset m as m =arr[m-1] 

when m=0, only increment k.

Java Program/Source Code

import java.util.*;
public class Main {
    public static void main(String args[]){
        String text = "AAAABAAABAB";
        String pattern = " ";

        Scanner sc = new Scanner(System.in);
        while (true) {
            System.out.println("Enter pattern:");
            pattern=sc.nextLine();
            if (pattern.equals("exit")) {
                break;
            }
            stringSearch(pattern,text);
        }
    }
    // array for pattern count

        static int[] arrayprefixsuffix(String pattern) {
    
            int[] arr = new int [pattern.length()];
            int j =0;
            arr[0] =0;
            //make first index zero
            for(int i =1;i<pattern.length();){
                //comparing character of pattern
                if(pattern.charAt(i)==pattern.charAt(j)){
                    j++;
                    //reset j when match found
                    arr[i]=j;
                    //increase i 
                    i++;
                } else{
                    //if match not found and and j not 0 decrease j
                    if(j!=0){
                        j=arr[j-1];
                
                    } else{
                        //else put 0 to array and increase i
                        arr[i]=0;
                        i++;
                    }
                }     
            }
            return arr;
        }
        static void stringSearch(String pattern,String text){
            int arr[] = arrayprefixsuffix(pattern);
            boolean flag = false;
            int k = 0;
            int m =0;
            while(k < text.length()) {
                //if text character and pattern character matches
                //increase pattern character and text character
                if(pattern.charAt(m)==text.charAt(k)) {
                    m++;
                    k++;
                }
                //if pattern character equal to pattern length 
                //means pattern matched
                if(m == pattern.length()) {
                    flag = true;
                    System.out.println("Pattern found at index "+ (k-m));
                    m=arr[m-1];
                } else if(k<(text.length()) && pattern.charAt(m)!= text.charAt(k)) {
                    //if mismatch occurs, reset m 
                    flag = false;
                    if(m!=0)
                        m= arr[m-1];
                    else
                        k++;
                }
            }
            if(flag==false) {
                //if flag variable is false means pattern not found
                System.err.println("Pattern not found");
            }
        }
}

Program Explanation

1) KMP algorithm takes advantage of the degenerating property.

Degenerating property is when a pattern has a sub-pattern that appears more than once. KMP algorithm takes advantage of this property to reduce the recurrence of steps. 

2) A function is used that tells us about how the pattern matches against sub-patterns of itself. 

3) To know how many characters we should jump, we analyze the pattern and develop an array to keep count. It tells us the count of characters to be skipped.

 

Example (1) -

For the pattern “ABCDABEABF”,

Array will be [0, 0, 0, 0, 1, 2, 0, 1, 2, 0]

Sub-pattern "AB" repeats itself at index 4,5 hence we write the count as 1,2 ("A "repeats and "AB" repeats ) in the array

and same for index 7,8.

Example(2)-

Pattern is "ABCDEABFABC"

Array will be [0,0,0,0,0,1,2,0,1,2,3]

Sub-pattern "AB" repeats itself at index 5,6 hence the count 1,2 . At index 8,9,10 two sub-patterns repeat that are "AB" and "ABC" hence the count 1,2,3.

 

4) After this, we compare characters of pattern and string (m=0 initially), and then with every match, we increment the variable assigned for pattern and string.

5)If we encounter a mismatch, (and m>0) reset m as m =arr[m-1] 

when m=0, only increment k.

 

Runtime Testcases

Best case runtime example -

(String declared as - "AABAACDEBBAAA")

Enter pattern:

ZDD

Pattern not found

Average case runtime example-

 

(String declared as - "AABAACDEBBAAA")

Enter pattern:

BAA

Pattern found at index 2

Pattern found at index 9

 

Worst case runtime example-

 

(String declared as - "AAAABAAABAB")

Enter pattern:

AAA

Pattern found at index 0

Pattern found at index 1

Pattern found at index 5

Download project

Reviews Report

Submitted by Janhvi Singh (Janhvi)

Download packets of source code on Coders Packet