# Knuth Morris Pratt Algorithm String Matching in Java

KMP algorithm is a linear-time string matching algorithm wherein we search a pattern /substring in the given text string and check whether it's present or not.

## Expected Input and Output

Three cases for the KMP algorithm are as follows -

Best Case -

The best-case occurs when the pattern's first character is not present in the text string at all. Let's see how.

For example, we have -

Text String - AABBCCDDEFFGOP

Pattern – KFF

According to the algorithm, we search for the pattern till the last of the string. But if the first character itself is not present in the string, then in the end it will skip the search for the rest of the characters ( In this example K will mismatch with G ( index 11), and the pattern will move to index 12 where the pattern will exceed the string length, therefore skipping the search for index 12 and 13)

Best Case Complexity (in this case) -

O(n-m+1)

= O(12)

O(n-m+1) ≈ O(n)

where,

n= length of the string.

m=length of the pattern.

Worst Case -

The worst-case occurs when we have to check all the characters in pattern and string at least once.

The word case complexity of the KMP algorithm is O(m+n).

## Problem Solution

1) We take a string and check whether the given pattern is present in the string or not.

2 )We make an integer array to know how many characters we should skip (to reduce the reoccurrence of steps).

3) When the pattern is found, this array integer tells us which character to check next and which characters to skip.

(detailed explanation in problem explanation section)

For example -

Pattern - AAA

Array[] = [0,1,2]

Initially k=0, m=0 (k for string and m for pattern)

Whenever there is a match do k++, m++

When the pattern is found at index 2, we reset m. Now the value of arr[m-1] gives the index of the character to match next.

When there is a mismatch (and m>0) reset m as m =arr[m-1]

when m=0, only increment k.

## Java Program/Source Code

```import java.util.*;
public class Main {
public static void main(String args[]){
String text = "AAAABAAABAB";
String pattern = " ";

Scanner sc = new Scanner(System.in);
while (true) {
System.out.println("Enter pattern:");
pattern=sc.nextLine();
if (pattern.equals("exit")) {
break;
}
stringSearch(pattern,text);
}
}
// array for pattern count

static int[] arrayprefixsuffix(String pattern) {

int[] arr = new int [pattern.length()];
int j =0;
arr =0;
//make first index zero
for(int i =1;i<pattern.length();){
//comparing character of pattern
if(pattern.charAt(i)==pattern.charAt(j)){
j++;
//reset j when match found
arr[i]=j;
//increase i
i++;
} else{
if(j!=0){
j=arr[j-1];

} else{
//else put 0 to array and increase i
arr[i]=0;
i++;
}
}
}
return arr;
}
static void stringSearch(String pattern,String text){
int arr[] = arrayprefixsuffix(pattern);
boolean flag = false;
int k = 0;
int m =0;
while(k < text.length()) {
//if text character and pattern character matches
//increase pattern character and text character
if(pattern.charAt(m)==text.charAt(k)) {
m++;
k++;
}
//if pattern character equal to pattern length
//means pattern matched
if(m == pattern.length()) {
flag = true;
System.out.println("Pattern found at index "+ (k-m));
m=arr[m-1];
} else if(k<(text.length()) && pattern.charAt(m)!= text.charAt(k)) {
//if mismatch occurs, reset m
flag = false;
if(m!=0)
m= arr[m-1];
else
k++;
}
}
if(flag==false) {
}
}
}
```

## Program Explanation

1) KMP algorithm takes advantage of the degenerating property.

Degenerating property is when a pattern has a sub-pattern that appears more than once. KMP algorithm takes advantage of this property to reduce the recurrence of steps.

2) A function is used that tells us about how the pattern matches against sub-patterns of itself.

3) To know how many characters we should jump, we analyze the pattern and develop an array to keep count. It tells us the count of characters to be skipped.

Example (1) -

For the pattern “ABCDABEABF”,

Array will be [0, 0, 0, 0, 1, 2, 0, 1, 2, 0]

Sub-pattern "AB" repeats itself at index 4,5 hence we write the count as 1,2 ("A "repeats and "AB" repeats ) in the array

and same for index 7,8.

Example(2)-

Pattern is "ABCDEABFABC"

Array will be [0,0,0,0,0,1,2,0,1,2,3]

Sub-pattern "AB" repeats itself at index 5,6 hence the count 1,2 . At index 8,9,10 two sub-patterns repeat that are "AB" and "ABC" hence the count 1,2,3.

4) After this, we compare characters of pattern and string (m=0 initially), and then with every match, we increment the variable assigned for pattern and string.

5)If we encounter a mismatch, (and m>0) reset m as m =arr[m-1]

when m=0, only increment k.

## Runtime Testcases

Best case runtime example -

(String declared as - "AABAACDEBBAAA")

Enter pattern:

ZDD

Average case runtime example-

(String declared as - "AABAACDEBBAAA")

Enter pattern:

BAA

Pattern found at index 2

Pattern found at index 9

Worst case runtime example-

(String declared as - "AAAABAAABAB")

Enter pattern:

AAA

Pattern found at index 0

Pattern found at index 1

Pattern found at index 5