Class BloomFilter


  • public class BloomFilter
    extends java.lang.Object
    A bloom filter is a probabilistic data structure for checking if a value is contained in a set. It has a false positive rate, i.e. it can tell that a value is contained in the filter although it is not. This implementation does not support remove() because clearing a bit that is used by more than one element might lead to false negatives. (If remove is required a counting filter needs to be used).
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected java.util.BitSet bs
      The bit set.
      protected int k
      The number of hash (functions).
      protected int m
      The number of bits in the bit set
      protected int n
      The number of expected entries.
    • Constructor Summary

      Constructors 
      Constructor Description
      BloomFilter()
      Create a new BloomFilter with default p = 0.01 and n = 1000.
      BloomFilter​(double p, int n)
      Create a new BloomFilter.
      BloomFilter​(int m, int k, int n)
      Create a new BloomFilter.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean add​(byte[] value)
      Add a value to the filter.
      void clear()
      Clear the filter content.
      static int computeAcceptableN​(long k, long m)
      Compute acceptable amount of elements with given number of hashes and size of filter in bits.
      static int computeOptimalK​(long n, long m)
      Compute optimal number of hash functions using expected number of elements and size of filter in bits.
      static int computeOptimalM​(long n, double p)
      Compute optimal size of bloom filter in bits by expected number of elements and acceptable false positive rate.
      static double computeP​(long k, long m, double insertedElements)
      Compute best-case (uniform hash function) false positive probability.
      static int[] hashK​(byte[] value, int m, int k)
      Compute k hashes based on two hashes using (hash1+i*hash2)%m;
      static void main​(java.lang.String[] args)
      Main for testing.
      boolean mightContain​(byte[] value)
      Test if a value is contained in the filter.
      static int murmur3​(byte[] data, int seed)
      Compute the murmur hash for a byte array.
      java.lang.String toString()
      Get the string representation.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • bs

        protected java.util.BitSet bs
        The bit set.
      • k

        protected int k
        The number of hash (functions).
      • m

        protected int m
        The number of bits in the bit set
      • n

        protected int n
        The number of expected entries.
    • Constructor Detail

      • BloomFilter

        public BloomFilter()
        Create a new BloomFilter with default p = 0.01 and n = 1000.
      • BloomFilter

        public BloomFilter​(double p,
                           int n)
        Create a new BloomFilter.
        Parameters:
        p - The acceptable false positive rate.
        n - The expected number of entries.
      • BloomFilter

        public BloomFilter​(int m,
                           int k,
                           int n)
        Create a new BloomFilter.
        Parameters:
        m - The number of bits in the filter.
        k - The number of hashes used.
        n - The expected number of values in the filter.
    • Method Detail

      • add

        public boolean add​(byte[] value)
        Add a value to the filter.
        Parameters:
        value - The value.
      • clear

        public void clear()
        Clear the filter content.
      • mightContain

        public boolean mightContain​(byte[] value)
        Test if a value is contained in the filter.
        Parameters:
        value - The value.
        Returns:
        True, if value might be contained.
      • computeOptimalM

        public static int computeOptimalM​(long n,
                                          double p)
        Compute optimal size of bloom filter in bits by expected number of elements and acceptable false positive rate.
        Parameters:
        n - Expected number of elements.
        p - Acceptable false positive rate.
        Returns:
        The optimal size of the bloom filter in bits.
      • computeOptimalK

        public static int computeOptimalK​(long n,
                                          long m)
        Compute optimal number of hash functions using expected number of elements and size of filter in bits.
        Parameters:
        n - Expected number of elements inserted in the bloom filter
        m - The size of the bloom filter in bits.
        Returns:
        The optimal number of hash functions.
      • computeAcceptableN

        public static int computeAcceptableN​(long k,
                                             long m)
        Compute acceptable amount of elements with given number of hashes and size of filter in bits.
        Parameters:
        k - Number of hashes.
        m - The size of the bloom filter in bits.
        Returns:
        Acceptable number elements that can be inserted.
      • computeP

        public static double computeP​(long k,
                                      long m,
                                      double insertedElements)
        Compute best-case (uniform hash function) false positive probability.
        Parameters:
        k - Number of hashes.
        m - The size of the bloom filter in bits.
        n - Number of elements inserted in the filter.
        Returns:
        The expected false positive probability.
      • hashK

        public static int[] hashK​(byte[] value,
                                  int m,
                                  int k)
        Compute k hashes based on two hashes using (hash1+i*hash2)%m;
        Parameters:
        value - The byte array.
        m - The maximum number of values.
        k - The number of hash values to compute.
        Returns:
        K hashes.
      • murmur3

        public static int murmur3​(byte[] data,
                                  int seed)
        Compute the murmur hash for a byte array.
      • toString

        public java.lang.String toString()
        Get the string representation.
        Overrides:
        toString in class java.lang.Object
      • main

        public static void main​(java.lang.String[] args)
        Main for testing.