Suppose we can index 50,000 documents before we run out of memory and need to merge with a disk based index that uses logarithmic merging. We want to index a billion pages, using 6 machines each responsible for having the index for 1/6 of the documents. We have: Generation 0 = 50000 * 2^0 Generation 1 = 50000 * 2^0 Generation 2= 50000 * 2^1 Generation 3 = 50000 * 2^2 Generation 4 = 50000 * 2^3 ....... Generation n: 50000 * 2^(n-1) = 50000* 3340 Hence n = log(3340) + 1 ~= 13
What's the largest generation number that will be used? - 13 generations What is the most generations that will need to be merged when we add have to flush the in memory partition to disk? - 12 generations(Edited: 2019-11-04)
Total number of documents = 1000000000 Total number of machines = 6 Pages processed by each machine = 167 Million approximately Generation 1 = 50k*2 = 100k Generation 2 = 50k*4 = 400k Generation 3 = 50k*8 = 800k and so on Generation n => 50k x 2^(n-1) = 167M n-1 = log2(167M/5000) = 12 approx n = 13 Therefore 13 generations So largest number of generation = 13 Total number of merges = 12
Total number of documents = 10^9 Number of machines = 6 Total documents processed by a machine = (10^9)/6 = 167m Gen 0: 50k * 2^0 Gen 1: 50k * 2^0 Gen 2: 50k * 2^1 Gen 3: 50k * 2^2 Gen n: 50k * 2^(n-1) Total number of iterations = n = log(167m/50k) + 1 = 12.7 = 13 Total number of merges = n-1 = 12
Number of machines = 6 Pages per machine = 10^6/6= ~167 Million Generation 1 = 50k*2 = 100k Generation 2 = 50k*4 = 400k. . .
Generation n : 50k x 2^(n-1) = 167M n-1 = log2(167M/5000) = ~12 n = 13 What's the largest generation number that will be used?13
What is the most generations that will need to be merged when we add have to flush the in memory partition to disk?12
Total documents to index = 1 billion (1000000000) Total 6 independent machines Each Machines is responsible of = 1 billion / 6 = 1.67 * 10^8 documents Each Machine have the capacity of up to 50K documents G0 = 50,000 * 2^0 = 50K G1 = 50,000 * 2^1 = 100K G2 = 50,000 * 2^2 = 200K G3 = 50,000 * 2^3 = 400k ......... ....... ....... For the nth generation n -1 = log2(1.67*10^8/50K) n -1 = log2(3340) n- 1 = 11.7 n = 12.7 n ~ 13
Hence the largest generation number would be 13
Total processed docs by a machine = (10^9)/6 = 167m
Generation 0: 50k * 2^0 Generation 1: 50k * 2^0 Generation2: 50k * 2^1.......... .............
Generation n: 50k * 2^(n-1)
Total iterations = n = log(167m/50k) + 1 = 12.7 = 13 Total merges = n-1 = 12