Categories
  • Data Structures & Algorithms 29
  • Programming Language 1
  • Utilities 1
  • Desgin Patterns 3
Chau Ngo
  • Posts
  • Resume
  • Categories
  • Rabin Karp Substring Search in Java

    This post will explore one of the substring search algorithms known as Rabin-Karp created by Richard M. Karp and Michael O. Rabin. There are two concepts to talk about, the hashing algorithm and rolling hash. In this post, we will use "equi" as the needle and "thequickbrownfo..." as the haystack.

    Hashing

    One method to hash a given string is to multiply the decimal value of each character of a given string by a prime raised to the power of the index of that given character. Using this method and 3 as the prime, the hash for "equi" is 4328.

    e 101 * 3^0 = 101
    q 113 * 3^1 = 339
    u 117 * 3^2 = 1053
    i 105 * 3^3 = 2635
            sum = 4328

    This algorithm first encodes the needle, "equi", and the first section of the haystack with equal length as the needle, "theq", as hash strings. We already know "equi" yields 4328. Hashing "theq" yields 4388. If the two hashes are equal, then the strings are the same*.

    t 116 * 3^0 = 116
    h 104 * 3^1 = 312
    e 101 * 3^2 = 909
    q 113 * 3^3 = 3051
            sum = 4388

    Rolling Hash

    The second part of the algorithm is the rolling hash. Essentially, its goal is to continuously select the next section of the haystack with equivalent length of the needle, hash it, compare with the hash of the needle, and return the index if the hashes are equal. The way the Rabin-Karp algorithm selects the next section does not require rehashing. It selects the next section by first subtracting the decimal value of the first character of the previous section, divide the difference by the prime (this essentially negates 1 from all the exponents), and adding to the quotient the product of the newly introduced character and prime raised to the power of the largest index.

    t 116 * 3^0 = 116
    h 104 * 3^1 = 312
    e 101 * 3^2 = 909
    q 113 * 3^3 = 3051
            sum = 4388
    u 4388 - 116 = 4272
      4272 / 3 = 1424
      1424 + (117 * 3^3) = 4583
    i 4583 - 104 = 4479
      4479 / 3 = 1493
      1493 + (105 * 3^3) = 4328 // Matches the hash of equi

    The following shows another way to visualize this. Spaces are added increase readability.

    e q u i 4328
    t h e q u i c k b r o ...
    t h e q 4388
      h e q u 4583
        e q u i 4328 // Matches the hash of equi

    Implementation

    public static int find (String haystack, String needle) {
        int prime = 3;
        int h_haystack = 0;
        int h_needle = 0;
    
        for (int i = 0; i < needle.length(); i++) {
            h_needle += needle.charAt(i) * Math.pow(prime, i);
            h_haystack += haystack.charAt(i) * Math.pow(prime, i);
        }
    
        if (h_haystack == h_needle)
            return 0;
    
        for (int i = 1; i < haystack.length() - needle.length(); i++) {
            h_haystack -= haystack.charAt(i - 1);
            h_haystack /= prime;
            h_haystack += haystack.charAt(i + needle.length() - 1) 
                * Math.pow(prime, needle.lenth() - 1);
    
            if (h_haystack == h_needle)
                return i;
        }
    
        return -1;
    }

    A Note on Selecting Prime

    3 is probably a good enough prime for this example because the numbers we are dealing with are the decimal representation of the alphabet, which ranges from 97 to 122. When the variations of character increases, a bigger prime might be required to avoid collisions.

    Posted 2 weeks ago by Chau in Data Structures & Algorithms.
  • Merge Two Sorted Lists

    Write an algorithm that merges two sorted lists. The general idea to approach this problem is to have two pointers for the two lists. There will be another pointer to itera...

    Posted 2 months ago by Chau in Data Structures & Algorithms.
  • Determine if BinaryTree is Balanced

    This post will explore the problem of checking whether a binary tree is balanced or not. A binary tree is considered to be balanced if the height difference between two sub...

    Posted 2 months ago by Chau in Data Structures & Algorithms.
  • Letter is Constructible from Magazine

    The problem in this post is concerned with answering the question of whether a given letter can be reconstructed with a given magazine. The letter is reconstructible when a...

    Posted 2 months ago by Chau in Data Structures & Algorithms.
  • Palindromic Permutation

    This post explores the question of if the characters of a given word can rearranged such that the final word is a palindrome. For example, the word "edified" coul...

    Posted 2 months ago by Chau in Data Structures & Algorithms.
  • Convert Roman Numeral to Integer and Vice Versa

    This post will explore the situations where there is a need to convert a roman numeral to integer and vice versa. The common basic symbols in the roman numeral system are I...

    Posted 2 years ago by Chau in Data Structures & Algorithms.
  • Palindrome Integer

    Given an integer, determine if it is a palindrome number.

    Approaches

    There are a couple of methods in approaching this problem. The first thing that comes ...

    Posted 2 years ago by Chau in Data Structures & Algorithms.
  • Lowest Common Ancestor of Two Nodes in a Binary Tree

    Any two nodes in a binary tree has the root node as a common ancestor, assuming both nodes are in the same tree. The lower common ancestor node is the furthest node from th...

    Posted 3 years ago by Chau in Data Structures & Algorithms.
  • Reverse an Integer

    Given an integer, compute and return its reverse form. This problem becomes a regular reversal problem if the integer is converted into a string to get its character array....

    Posted 3 years ago by Chau in Data Structures & Algorithms.
  • Perfect Squares and ℤ/ℝ Square Roots

    This post will explore how the binary search algorithm can help us check if a given number is a perfect square, find the integer square root of a given number, and find the...

    Posted 3 years ago by Chau in Data Structures & Algorithms.
  • BinarySearch in JAVA

    When given a sorted list of numbers, binary search is the best algorithm to have in your arsenal. The binary search finds the target element in O(log(n)) time by taking adv...

    Posted 3 years ago by Chau in Data Structures & Algorithms.
  • SelectionSort in JAVA

    Today, I will be show another comparative sorting algorithm known as Selection Sort. Selection sort works by finding a smaller element downstream from the current element a...

    Posted 3 years ago by Chau in Data Structures & Algorithms.
  • BinaryTree PreOrder Traversal in JAVA

    PreOrder traversal is very similar to in-order traversal in terms of code. The idea behind pre-order traversal is to do all the processing before visiting the left subtree ...

    Posted 3 years ago by Chau in Data Structures & Algorithms.
  • Next Greater Element in Linear Time in JAVA

    This is my implementation of the algorithm for that gets the next greater element than the current element in linear time. For elements with no next greater element, match ...

    Posted 3 years ago by Chau in Data Structures & Algorithms.
  • BinaryTree VerticalOrder Traversal in JAVA

    This is my implementation of the BinaryTree vertical order traversal. Vertical order traversal is an algorithm that groups all nodes with the same horizontal distance toget...

    Posted 3 years ago by Chau in Data Structures & Algorithms.
← Previous