By Janhvi Singh
KMP algorithm is a linear-time string matching algorithm wherein we search a pattern /substring in the given text string and check whether it's present or not.
Three cases for the KMP algorithm are as follows -
Best Case -
The best-case occurs when the pattern's first character is not present in the text string at all. Let's see how.
For example, we have -
Text String - AABBCCDDEFFGOP
Pattern – KFF
According to the algorithm, we search for the pattern till the last of the string. But if the first character itself is not present in the string, then in the end it will skip the search for the rest of the characters ( In this example K will mismatch with G ( index 11), and the pattern will move to index 12 where the pattern will exceed the string length, therefore skipping the search for index 12 and 13)
Best Case Complexity (in this case) -
O(n-m+1)
= O(12)
O(n-m+1) ≈ O(n)
where,
n= length of the string.
m=length of the pattern.
Worst Case -
The worst-case occurs when we have to check all the characters in pattern and string at least once.
The word case complexity of the KMP algorithm is O(m+n).
1) We take a string and check whether the given pattern is present in the string or not.
2 )We make an integer array to know how many characters we should skip (to reduce the reoccurrence of steps).
3) When the pattern is found, this array integer tells us which character to check next and which characters to skip.
(detailed explanation in problem explanation section)
For example -
Text String - AAABBBAAAD
Pattern - AAA
Array[] = [0,1,2]
Initially k=0, m=0 (k for string and m for pattern)
Whenever there is a match do k++, m++
When the pattern is found at index 2, we reset m. Now the value of arr[m-1] gives the index of the character to match next.
When there is a mismatch (and m>0) reset m as m =arr[m-1]
when m=0, only increment k.
import java.util.*; public class Main { public static void main(String args[]){ String text = "AAAABAAABAB"; String pattern = " "; Scanner sc = new Scanner(System.in); while (true) { System.out.println("Enter pattern:"); pattern=sc.nextLine(); if (pattern.equals("exit")) { break; } stringSearch(pattern,text); } } // array for pattern count static int[] arrayprefixsuffix(String pattern) { int[] arr = new int [pattern.length()]; int j =0; arr[0] =0; //make first index zero for(int i =1;i<pattern.length();){ //comparing character of pattern if(pattern.charAt(i)==pattern.charAt(j)){ j++; //reset j when match found arr[i]=j; //increase i i++; } else{ //if match not found and and j not 0 decrease j if(j!=0){ j=arr[j-1]; } else{ //else put 0 to array and increase i arr[i]=0; i++; } } } return arr; } static void stringSearch(String pattern,String text){ int arr[] = arrayprefixsuffix(pattern); boolean flag = false; int k = 0; int m =0; while(k < text.length()) { //if text character and pattern character matches //increase pattern character and text character if(pattern.charAt(m)==text.charAt(k)) { m++; k++; } //if pattern character equal to pattern length //means pattern matched if(m == pattern.length()) { flag = true; System.out.println("Pattern found at index "+ (k-m)); m=arr[m-1]; } else if(k<(text.length()) && pattern.charAt(m)!= text.charAt(k)) { //if mismatch occurs, reset m flag = false; if(m!=0) m= arr[m-1]; else k++; } } if(flag==false) { //if flag variable is false means pattern not found System.err.println("Pattern not found"); } } }
1) KMP algorithm takes advantage of the degenerating property.
Degenerating property is when a pattern has a sub-pattern that appears more than once. KMP algorithm takes advantage of this property to reduce the recurrence of steps.
2) A function is used that tells us about how the pattern matches against sub-patterns of itself.
3) To know how many characters we should jump, we analyze the pattern and develop an array to keep count. It tells us the count of characters to be skipped.
Example (1) -
For the pattern “ABCDABEABF”,
Array will be [0, 0, 0, 0, 1, 2, 0, 1, 2, 0]
Sub-pattern "AB" repeats itself at index 4,5 hence we write the count as 1,2 ("A "repeats and "AB" repeats ) in the array
and same for index 7,8.
Example(2)-
Pattern is "ABCDEABFABC"
Array will be [0,0,0,0,0,1,2,0,1,2,3]
Sub-pattern "AB" repeats itself at index 5,6 hence the count 1,2 . At index 8,9,10 two sub-patterns repeat that are "AB" and "ABC" hence the count 1,2,3.
4) After this, we compare characters of pattern and string (m=0 initially), and then with every match, we increment the variable assigned for pattern and string.
5)If we encounter a mismatch, (and m>0) reset m as m =arr[m-1]
when m=0, only increment k.
Best case runtime example -
(String declared as - "AABAACDEBBAAA")
Enter pattern:
ZDD
Pattern not found
Average case runtime example-
(String declared as - "AABAACDEBBAAA")
Enter pattern:
BAA
Pattern found at index 2
Pattern found at index 9
Worst case runtime example-
(String declared as - "AAAABAAABAB")
Enter pattern:
AAA
Pattern found at index 0
Pattern found at index 1
Pattern found at index 5
Submitted by Janhvi Singh (Janhvi)
Download packets of source code on Coders Packet
Comments