Javatpoint Logo
Javatpoint Logo

Boyer Moore Java

Boyer-Moore algorithm is a string searching or matching algorithm developed by Robert S. Boyer and J Strother Moore in 1977. It is a widely used and the most efficient string-matching algorithm. It is much faster than the brute-force algorithm. In this section, we will discuss the Boyer-Moore algorithm, features, and its implementation in a Java program. It runs in time O(nm+s) complexity. The worst-case is:

T=ssssssss……………ssssssss

P=psssssssss

The above sequence may occur in images and DNA sequences.

Features of Boyer Moore Algorithm

  • It compares character by character from right to left;
  • The preprocessing phase in O(m+)time and space complexity.
  • The complexity of the searching phase is O(mn).
  • In the worst case (when searching for a non-periodic pattern), it makes 3n text character comparisons.
  • Its best performance complexity is O(n / m).

The algorithm is based on the following two heuristics:

  • Looking-Glass Heuristics
  • Character-Jump Heuristics

Let's understand the working of the Byer-Moore algorithm.

Working of Boyer-Moore Algorithm

The algorithm starts tracing characters from the rightmost character of the given pattern and moves towards the left. In case of any mismatch and complete matching of the pattern, it uses two pre-computed functions that shift the characters to the right and left, respectively. These two precomputed shift functions are called the good-suffix shift (or matching shift) and the bad-character shift (or occurrence shift).

Note: For matching the pattern, align characters in left to right order and compare characters in right to left order.

Bad Character Shift

When a mismatch occurs, skip alignment until one of the following conditions does not meet:

  1. Mismatch becomes a match
  2. P moves past mismatch character

For example, consider the following text (T) and pattern (P) given below.

Boyer Moore Java

Let's start matching the pattern.

Step 1: Align characters in left to right fashion and compare characters in right to left order.

We see that the last three characters of P matched with the characters in T. The fourth character (T) does not match. According to the rule discussed above, skip alignment until a mismatch becomes a match. Since the seventh character (C) in P matched with C in T.

Boyer Moore Java

Step 2: Skip three characters to the right to matching the pattern.

After shifting, again compare the characters from right to left. The first character is matched. We observe that the character A does not occur in the left of P. In this case, P moves to the past mismatch character (A) in T.

Boyer Moore Java

Step 3: Move P past mismatch character, we get:

Boyer Moore Java

The pattern is matched.

Note: The bad-character shift can be negative. Since for shifting the characters, the Boyer-Moore algorithm applies the maximum (skip characters) between the good-suffix shift and bad-character shift.

Good-Suffix Shift

Let, t substring matched by inner loop, then skip characters until:

  1. There are no mismatches between P and t.
  2. P moves past t.

For example, consider the following pattern.

Boyer Moore Java

Step 1: Compare characters from right to left. We see that the last three characters of P matched with characters in T denoted with t.

Boyer Moore Java

Step 2: Skip characters until there is no match between P and t. We observe that the first four characters (from left to right) of P (C T T A C) match with the last five characters in t.

Boyer Moore Java

Step 3: Skip three alignments to get the match. Therefore, we get the match.

Boyer Moore Java

The above two-shift functions can be defined as follows:

The good-suffix shift function is stored in a table called bmGs of size m+1. The computation of the table bmGs use a table suff defined as follows:

The bad-character shift function is stored in a table bmBc of size σ. For c in ∑:

Boyer Moore Pattern Matching Example

Consider the following pattern.

Boyer Moore Java

Let's start matching.

Step 1: Compare characters from right to left. We see that the first character is mismatched i.e. G does not match with T.

Boyer Moore Java

Step 2: Now, skip the characters until we found a match. A match is found after six characters. Here, the good suffix shift rule will not apply.

Boyer Moore Java

bc: 6, gs: 0

According to bad character shift, P moves past mismatch character (i.e. G).

Boyer Moore Java

Step 3: Again, compare characters from right to left. We see that the first three characters of P matched (t) with T and the fourth one is not matched.

Boyer Moore Java

Here, we can apply both functions i.e. bad character suffix and good character suffix. If we apply the bad character suffix, it skips only one character. If we apply the good character suffix, it skips two alignments. Therefore, we will apply the good character suffix because the algorithm states, skip more alignments. Hence, we skip two alignments.

Boyer Moore Java

bc: 0, gs: 2

After shifting alignment by three characters, we get:

Boyer Moore Java

bc: 2, gs: 7

Here, we observe that C does not appears in the left of P. Therefore, bad character alignment skips two alignments and the good character alignments skips the seven alignments.

Step 4: After shifting characters, we see that the string is matched.

Boyer Moore Java

In the above pattern, we have skipped 15 alignments and 11 characters of T were ignored.

Boyer Moore Preprocessing Phases

Pre-calculated skips for the pattern T: A A T C A A T A G C and P: T C G C can be defined as follows. In the above pattern, we have used the bad character shift function.

Boyer Moore Java

Above table defines the number of skip alignments (characters).

Boyer Moore Algorithm Pseudo Code

Pattern Searching Java Program

Let's see the pattern searching Java program. In the following program, we have implemented the brute-force string searching algorithm.

PatternSearchingExample.java

Output:

Brute force looking for abddef in abcfefabddef
	Found match in the given text at index 6
Boyer-Moore looking for abddef in abcfefabddef
	Found match in the given text at index 6

Let's implement the algorithm in a Java program.

Boyer Moore Java Program

Let's implement the Boyer-Moore algorithm and search pattern through a Java program.

BoyerMooreImplementation.java

Output:

Patterns occur at character = 0
Patterns occur at character = 5
Patterns occur at character = 10

Let's see another Java program in which we have implemented different logic for pattern searching. The following program checks if the specified pattern found in the text.

BoyerMooreExample.java

Output:

Pattern Not Matched
	text: aabbccdef
	word: cde
	exp: 0, res: 5
Pattern Not Matched
	text: zzzzaaapppxyzabc
	word: pqrs
	exp: 1, res: -1
Pattern Matched
Pattern Matched
Pattern Matched
Pattern Not Matched
	text: pqrsabcdxyzamnop
	word: cdxyza
	exp: 1, res: 6
Pattern Matched
Pattern Matched






Youtube For Videos Join Our Youtube Channel: Join Now

Feedback


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA