Class BloomFilter

java.lang.Object
jadex.collection.BloomFilter

public class BloomFilter extends Object
A bloom filter is a probabilistic data structure for checking if a value is contained in a set. It has a false positive rate, i.e. it can tell that a value is contained in the filter although it is not. This implementation does not support remove() because clearing a bit that is used by more than one element might lead to false negatives. (If remove is required a counting filter needs to be used).
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected BitSet
    The bit set.
    protected int
    The number of hash (functions).
    protected int
    The number of bits in the bit set
    protected int
    The number of expected entries.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Create a new BloomFilter with default p = 0.01 and n = 1000.
    BloomFilter(double p, int n)
    Create a new BloomFilter.
    BloomFilter(int m, int k, int n)
    Create a new BloomFilter.
  • Method Summary

    Modifier and Type
    Method
    Description
    boolean
    add(byte[] value)
    Add a value to the filter.
    void
    Clear the filter content.
    static int
    computeAcceptableN(long k, long m)
    Compute acceptable amount of elements with given number of hashes and size of filter in bits.
    static int
    computeOptimalK(long n, long m)
    Compute optimal number of hash functions using expected number of elements and size of filter in bits.
    static int
    computeOptimalM(long n, double p)
    Compute optimal size of bloom filter in bits by expected number of elements and acceptable false positive rate.
    static double
    computeP(long k, long m, double insertedElements)
    Compute best-case (uniform hash function) false positive probability.
    static int[]
    hashK(byte[] value, int m, int k)
    Compute k hashes based on two hashes using (hash1+i*hash2)%m;
    static void
    main(String[] args)
    Main for testing.
    boolean
    mightContain(byte[] value)
    Test if a value is contained in the filter.
    static int
    murmur3(byte[] data, int seed)
    Compute the murmur hash for a byte array.
    Get the string representation.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

    • bs

      protected BitSet bs
      The bit set.
    • k

      protected int k
      The number of hash (functions).
    • m

      protected int m
      The number of bits in the bit set
    • n

      protected int n
      The number of expected entries.
  • Constructor Details

    • BloomFilter

      public BloomFilter()
      Create a new BloomFilter with default p = 0.01 and n = 1000.
    • BloomFilter

      public BloomFilter(double p, int n)
      Create a new BloomFilter.
      Parameters:
      p - The acceptable false positive rate.
      n - The expected number of entries.
    • BloomFilter

      public BloomFilter(int m, int k, int n)
      Create a new BloomFilter.
      Parameters:
      m - The number of bits in the filter.
      k - The number of hashes used.
      n - The expected number of values in the filter.
  • Method Details

    • add

      public boolean add(byte[] value)
      Add a value to the filter.
      Parameters:
      value - The value.
    • clear

      public void clear()
      Clear the filter content.
    • mightContain

      public boolean mightContain(byte[] value)
      Test if a value is contained in the filter.
      Parameters:
      value - The value.
      Returns:
      True, if value might be contained.
    • computeOptimalM

      public static int computeOptimalM(long n, double p)
      Compute optimal size of bloom filter in bits by expected number of elements and acceptable false positive rate.
      Parameters:
      n - Expected number of elements.
      p - Acceptable false positive rate.
      Returns:
      The optimal size of the bloom filter in bits.
    • computeOptimalK

      public static int computeOptimalK(long n, long m)
      Compute optimal number of hash functions using expected number of elements and size of filter in bits.
      Parameters:
      n - Expected number of elements inserted in the bloom filter
      m - The size of the bloom filter in bits.
      Returns:
      The optimal number of hash functions.
    • computeAcceptableN

      public static int computeAcceptableN(long k, long m)
      Compute acceptable amount of elements with given number of hashes and size of filter in bits.
      Parameters:
      k - Number of hashes.
      m - The size of the bloom filter in bits.
      Returns:
      Acceptable number elements that can be inserted.
    • computeP

      public static double computeP(long k, long m, double insertedElements)
      Compute best-case (uniform hash function) false positive probability.
      Parameters:
      k - Number of hashes.
      m - The size of the bloom filter in bits.
      n - Number of elements inserted in the filter.
      Returns:
      The expected false positive probability.
    • hashK

      public static int[] hashK(byte[] value, int m, int k)
      Compute k hashes based on two hashes using (hash1+i*hash2)%m;
      Parameters:
      value - The byte array.
      m - The maximum number of values.
      k - The number of hash values to compute.
      Returns:
      K hashes.
    • murmur3

      public static int murmur3(byte[] data, int seed)
      Compute the murmur hash for a byte array.
    • toString

      public String toString()
      Get the string representation.
      Overrides:
      toString in class Object
    • main

      public static void main(String[] args)
      Main for testing.