2020-11-08

HW4 Bloom Filter.

Computing hash in the bloom filter as suggested by the professor:
Compute the md5 hash. View that hash as a 16 byte integer (i.e., make sure to use the raw md5 output - not some base64 encoded humanly readable output), pick hash functions as per class and use those to insert in a bloom filter. In Java and many other languages there are big int libraries you can use. Have the constants for configuring the size n of the bloom filter pulled out towards the top of your file. Assume I will test with millions of news items. Pick a, b at random less than n and check if they are relatively prime (gcd = 1), if not re-pick. This should not take too make iterations because of these bounds on the number of relatively prime numbers less than n: https://en.wikipedia.org/wiki/Euler's_totient_function#Growth_rate
(Edited: 2020-11-08)
Computing hash in the bloom filter as suggested by the professor: Compute the md5 hash. View that hash as a 16 byte integer (i.e., make sure to use the raw md5 output - not some base64 encoded humanly readable output), pick hash functions as per class and use those to insert in a bloom filter. In Java and many other languages there are big int libraries you can use. Have the constants for configuring the size n of the bloom filter pulled out towards the top of your file. Assume I will test with millions of news items. Pick a, b at random less than n and check if they are relatively prime (gcd = 1), if not re-pick. This should not take too make iterations because of these bounds on the number of relatively prime numbers less than n: https://en.wikipedia.org/wiki/Euler's_totient_function#Growth_rate
X