• # Rabin Karp Substring Search in Java

This post will explore one of the substring search algorithms known as Rabin-Karp created by Richard M. Karp and Michael O. Rabin. There are two concepts to talk about, the hashing algorithm and rolling hash. In this post, we will use "equi" as the needle and "thequickbrownfo..." as the haystack.

## Hashing

One method to hash a given string is to multiply the decimal value of each character of a given string by a prime raised to the power of the index of that given character. Using this method and 3 as the prime, the hash for "equi" is 4328.

``````e 101 * 3^0 = 101
q 113 * 3^1 = 339
u 117 * 3^2 = 1053
i 105 * 3^3 = 2635
sum = 4328``````

This algorithm first encodes the needle, "equi", and the first section of the haystack with equal length as the needle, "theq", as hash strings. We already know "equi" yields 4328. Hashing "theq" yields 4388. If the two hashes are equal, then the strings are the same*.

``````t 116 * 3^0 = 116
h 104 * 3^1 = 312
e 101 * 3^2 = 909
q 113 * 3^3 = 3051
sum = 4388``````

## Rolling Hash

The second part of the algorithm is the rolling hash. Essentially, its goal is to continuously select the next section of the haystack with equivalent length of the needle, hash it, compare with the hash of the needle, and return the index if the hashes are equal. The way the Rabin-Karp algorithm selects the next section does not require rehashing. It selects the next section by first subtracting the decimal value of the first character of the previous section, divide the difference by the prime (this essentially negates 1 from all the exponents), and adding to the quotient the product of the newly introduced character and prime raised to the power of the largest index.

``````t 116 * 3^0 = 116
h 104 * 3^1 = 312
e 101 * 3^2 = 909
q 113 * 3^3 = 3051
sum = 4388
u 4388 - 116 = 4272
4272 / 3 = 1424
1424 + (117 * 3^3) = 4583
i 4583 - 104 = 4479
4479 / 3 = 1493
1493 + (105 * 3^3) = 4328 // Matches the hash of equi``````

The following shows another way to visualize this. Spaces are added increase readability.

``````e q u i 4328
t h e q u i c k b r o ...
t h e q 4388
h e q u 4583
e q u i 4328 // Matches the hash of equi``````

## Implementation

``````public static int find (String haystack, String needle) {
int prime = 3;
int h_haystack = 0;
int h_needle = 0;

for (int i = 0; i < needle.length(); i++) {
h_needle += needle.charAt(i) * Math.pow(prime, i);
h_haystack += haystack.charAt(i) * Math.pow(prime, i);
}

if (h_haystack == h_needle)
return 0;

for (int i = 1; i < haystack.length() - needle.length(); i++) {
h_haystack -= haystack.charAt(i - 1);
h_haystack /= prime;
h_haystack += haystack.charAt(i + needle.length() - 1)
* Math.pow(prime, needle.lenth() - 1);

if (h_haystack == h_needle)
return i;
}

return -1;
}``````

## A Note on Selecting Prime

3 is probably a good enough prime for this example because the numbers we are dealing with are the decimal representation of the alphabet, which ranges from 97 to 122. When the variations of character increases, a bigger prime might be required to avoid collisions.

• ## Merge Two Sorted Lists

Write an algorithm that merges two sorted lists. The general idea to approach this problem is to have two pointers for the two lists. There will be another pointer to itera...

• ## Determine if BinaryTree is Balanced

This post will explore the problem of checking whether a binary tree is balanced or not. A binary tree is considered to be balanced if the height difference between two sub...

• ## Letter is Constructible from Magazine

The problem in this post is concerned with answering the question of whether a given letter can be reconstructed with a given magazine. The letter is reconstructible when a...

• ## Palindromic Permutation

This post explores the question of if the characters of a given word can rearranged such that the final word is a palindrome. For example, the word "edified" coul...

• ## Convert Roman Numeral to Integer and Vice Versa

This post will explore the situations where there is a need to convert a roman numeral to integer and vice versa. The common basic symbols in the roman numeral system are I...

• ## Palindrome Integer

Given an integer, determine if it is a palindrome number.

### Approaches

There are a couple of methods in approaching this problem. The first thing that comes ...

• ## Lowest Common Ancestor of Two Nodes in a Binary Tree

Any two nodes in a binary tree has the root node as a common ancestor, assuming both nodes are in the same tree. The lower common ancestor node is the furthest node from th...

• ## Reverse an Integer

Given an integer, compute and return its reverse form. This problem becomes a regular reversal problem if the integer is converted into a string to get its character array....

• ## Perfect Squares and ℤ/ℝ Square Roots

This post will explore how the binary search algorithm can help us check if a given number is a perfect square, find the integer square root of a given number, and find the...

• ## BinarySearch in JAVA

When given a sorted list of numbers, binary search is the best algorithm to have in your arsenal. The binary search finds the target element in O(log(n)) time by taking adv...

• ## SelectionSort in JAVA

Today, I will be show another comparative sorting algorithm known as Selection Sort. Selection sort works by finding a smaller element downstream from the current element a...

• ## BinaryTree PreOrder Traversal in JAVA

PreOrder traversal is very similar to in-order traversal in terms of code. The idea behind pre-order traversal is to do all the processing before visiting the left subtree ...

• ## Next Greater Element in Linear Time in JAVA

This is my implementation of the algorithm for that gets the next greater element than the current element in linear time. For elements with no next greater element, match ...

• ## BinaryTree VerticalOrder Traversal in JAVA

This is my implementation of the BinaryTree vertical order traversal. Vertical order traversal is an algorithm that groups all nodes with the same horizontal distance toget...