Package jadex.collection
Class BloomFilter
java.lang.Object
jadex.collection.BloomFilter
A bloom filter is a probabilistic data structure for
checking if a value is contained in a set.
It has a false positive rate, i.e. it can tell that
a value is contained in the filter although it is not.
This implementation does not support remove() because
clearing a bit that is used by more than one element
might lead to false negatives. (If remove is required
a counting filter needs to be used).
-
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionCreate a new BloomFilter with default p = 0.01 and n = 1000.BloomFilter
(double p, int n) Create a new BloomFilter.BloomFilter
(int m, int k, int n) Create a new BloomFilter. -
Method Summary
Modifier and TypeMethodDescriptionboolean
add
(byte[] value) Add a value to the filter.void
clear()
Clear the filter content.static int
computeAcceptableN
(long k, long m) Compute acceptable amount of elements with given number of hashes and size of filter in bits.static int
computeOptimalK
(long n, long m) Compute optimal number of hash functions using expected number of elements and size of filter in bits.static int
computeOptimalM
(long n, double p) Compute optimal size of bloom filter in bits by expected number of elements and acceptable false positive rate.static double
computeP
(long k, long m, double insertedElements) Compute best-case (uniform hash function) false positive probability.static int[]
hashK
(byte[] value, int m, int k) Compute k hashes based on two hashes using (hash1+i*hash2)%m;static void
Main for testing.boolean
mightContain
(byte[] value) Test if a value is contained in the filter.static int
murmur3
(byte[] data, int seed) Compute the murmur hash for a byte array.toString()
Get the string representation.
-
Field Details
-
bs
The bit set. -
k
protected int kThe number of hash (functions). -
m
protected int mThe number of bits in the bit set -
n
protected int nThe number of expected entries.
-
-
Constructor Details
-
BloomFilter
public BloomFilter()Create a new BloomFilter with default p = 0.01 and n = 1000. -
BloomFilter
public BloomFilter(double p, int n) Create a new BloomFilter.- Parameters:
p
- The acceptable false positive rate.n
- The expected number of entries.
-
BloomFilter
public BloomFilter(int m, int k, int n) Create a new BloomFilter.- Parameters:
m
- The number of bits in the filter.k
- The number of hashes used.n
- The expected number of values in the filter.
-
-
Method Details
-
add
public boolean add(byte[] value) Add a value to the filter.- Parameters:
value
- The value.
-
clear
public void clear()Clear the filter content. -
mightContain
public boolean mightContain(byte[] value) Test if a value is contained in the filter.- Parameters:
value
- The value.- Returns:
- True, if value might be contained.
-
computeOptimalM
public static int computeOptimalM(long n, double p) Compute optimal size of bloom filter in bits by expected number of elements and acceptable false positive rate.- Parameters:
n
- Expected number of elements.p
- Acceptable false positive rate.- Returns:
- The optimal size of the bloom filter in bits.
-
computeOptimalK
public static int computeOptimalK(long n, long m) Compute optimal number of hash functions using expected number of elements and size of filter in bits.- Parameters:
n
- Expected number of elements inserted in the bloom filterm
- The size of the bloom filter in bits.- Returns:
- The optimal number of hash functions.
-
computeAcceptableN
public static int computeAcceptableN(long k, long m) Compute acceptable amount of elements with given number of hashes and size of filter in bits.- Parameters:
k
- Number of hashes.m
- The size of the bloom filter in bits.- Returns:
- Acceptable number elements that can be inserted.
-
computeP
public static double computeP(long k, long m, double insertedElements) Compute best-case (uniform hash function) false positive probability.- Parameters:
k
- Number of hashes.m
- The size of the bloom filter in bits.n
- Number of elements inserted in the filter.- Returns:
- The expected false positive probability.
-
hashK
public static int[] hashK(byte[] value, int m, int k) Compute k hashes based on two hashes using (hash1+i*hash2)%m;- Parameters:
value
- The byte array.m
- The maximum number of values.k
- The number of hash values to compute.- Returns:
- K hashes.
-
murmur3
public static int murmur3(byte[] data, int seed) Compute the murmur hash for a byte array. -
toString
Get the string representation. -
main
Main for testing.
-