2019-10-29

Oct 30 In-Class Exercise Thread .

Post your solutions to the Oct. 30 In-Class Exercise to this thread.
Best,
Chris
Post your solutions to the Oct. 30 In-Class Exercise to this thread. Best, Chris
2019-10-30

-- Oct 30 In-Class Exercise Thread
((resource:new doc 2019-10-30 14.04.19_1.jpg|Resource Description for new doc 2019-10-30 14.04.19_1.jpg))

-- Oct 30 In-Class Exercise Thread
1000000000/(50000*6) = 3333.33 numberOfGenerations = log2(3333.33) ~= 12 numOfMerges ~= 11
(Edited: 2019-10-30)
1000000000/(50000*6) = 3333.33 numberOfGenerations = log2(3333.33) ~= 12 numOfMerges ~= 11

-- Oct 30 In-Class Exercise Thread
Each machine is responsible for 1 billion by 6 documents (10^9/6 documents) which corresponds to approximately 167 million documents per machine
When 50,000 documents are filled in the memory we create a generation and repeat this every time there are 2 blocks in the same generation.
so we write 50k, 100k, 200k, 400k, 800k, 1.6M, 3.2M, 6.4M, 12.8M, 25.6M, 50.12M, 100.24M and we stop there
This gives us 13 generations in total and 12 total merges to get the final index.
(Edited: 2019-10-30)
Each machine is responsible for 1 billion by 6 documents (10^9/6 documents) which corresponds to approximately 167 million documents per machine When 50,000 documents are filled in the memory we create a generation and repeat this every time there are 2 blocks in the same generation. so we write 50k, 100k, 200k, 400k, 800k, 1.6M, 3.2M, 6.4M, 12.8M, 25.6M, 50.12M, 100.24M and we stop there This gives us 13 generations in total and 12 total merges to get the final index.

-- Oct 30 In-Class Exercise Thread
For each machine: Total documents responsible = (110^9)/6 = 1.67 10^8 Generation Documents G0 50000 = 50000 * 2^0 G1 50000 = 50000 * 2^0 G2 100000 = 50000 * 2^1 G3 200000 = 50000 * 2^2 G4 400000 = 50000 * 2^3 ................... Gn = 1.6710^8 = 50000 3340 = 50000* 2^(n-1) n-1= log(3340)~= 12 n =13 So 1)Largest generation number that will be used is 13 2)Most generations that will need to be merged = 13-1 = 12
(Edited: 2019-10-30)
For each machine: Total documents responsible = (1*10^9)/6 = 1.67 * 10^8 Generation Documents G0 50000 = 50000 * 2^0 G1 50000 = 50000 * 2^0 G2 100000 = 50000 * 2^1 G3 200000 = 50000 * 2^2 G4 400000 = 50000 * 2^3 ................... Gn = 1.67*10^8 = 50000* 3340 = 50000* 2^(n-1) n-1= log(3340)~= 12 n =13 So 1)Largest generation number that will be used is 13 2)Most generations that will need to be merged = 13-1 = 12

-- Oct 30 In-Class Exercise Thread
Generation 0 ==> 50K Generation 1 ==> 50K Generation 2 ==> 100K Generation 3 ==> 200k
Hence for storing 400 K we will require log 4 = 2 generations
Hence for 1 billion we will require log(10000) = n= 13 Generations
When we merge it in memory, n-1 merges = 12 merges
Generation 0 ==> 50K Generation 1 ==> 50K Generation 2 ==> 100K Generation 3 ==> 200k Hence for storing 400 K we will require log 4 = 2 generations Hence for 1 billion we will require log(10000) = n= 13 Generations When we merge it in memory, n-1 merges = 12 merges

-- Oct 30 In-Class Exercise Thread
For storing 400k : (log 4) = 2 generations Therefore, for 1 billion: log(10000) = 13 generations (n) Thus, on merging it in the memory, we will have n-1 merges = 12 merges
For storing 400k : (log 4) = 2 generations Therefore, for 1 billion: log(10000) = 13 generations (n) Thus, on merging it in the memory, we will have n-1 merges = 12 merges

-- Oct 30 In-Class Exercise Thread
Documents in total = 1000000000
We have six machines, so distribution = total documents/6 = 167 Million
To get the number of generations,
Start with generation 0 = 50k, 50k*1 = 50k
generation 1 = 50k*2 = 100k
generation 2 = 50k*4 = 400k
generation 3 = 50k*8 = 800k
Which is multiple of 2^n
  1. g. for generation 3, n=3 and so on
Similarly, for 167M,
167M = 50K*3340,
Hence, log(3340) = 11.7 ~ 12 which is n-1
Largest generation : n=13
Number of merges : 12
(Edited: 2019-10-30)
Documents in total = 1000000000 We have six machines, so distribution = total documents/6 = 167 Million To get the number of generations, Start with generation 0 = 50k, 50k*1 = 50k generation 1 = 50k*2 = 100k generation 2 = 50k*4 = 400k generation 3 = 50k*8 = 800k Which is multiple of 2^n e.g. for generation 3, n=3 and so on Similarly, for 167M, 167M = 50K*3340, Hence, log(3340) = 11.7 ~ 12 which is n-1 Largest generation : n=13 Number of merges : 12

-- Oct 30 In-Class Exercise Thread
1) For storing 400k: (log 4) = 2 generations, therefore, for 1 billion: log(10000) = 13 generations (n)
2) Thus, on merging it in the memory, we will have (n-1) merges = 12 merges
1) For storing 400k: (log 4) = 2 generations, therefore, for 1 billion: log(10000) = 13 generations (n) 2) Thus, on merging it in the memory, we will have (n-1) merges = 12 merges
2019-11-03

-- Oct 30 In-Class Exercise Thread
We have to index 1 billion pages
Total no. of machines = 6 Each machine can index 50000 pages before running out of memory.
Each machine will be responsible for (1 billion/6) pages = 166666666 pages
Generation 1 => 50,000 x 2o = 50,000 Generation 2 => 50,000 x 21 = 100,000 Generation 3 => 50,000 x 22 = 200,000
Generation n => 50000 x 2(n-1) = 166666666 n -1 = log2(166666666/5000) = 11.7 n ~= 13 generation
The largest generation number is 13 and the 12 generations will be merged to flush
(Edited: 2019-11-03)
We have to index 1 billion pages Total no. of machines = 6 Each machine can index 50000 pages before running out of memory. Each machine will be responsible for (1 billion/6) pages = 166666666 pages Generation 1 => 50,000 x 2o = 50,000 Generation 2 => 50,000 x 21 = 100,000 Generation 3 => 50,000 x 22 = 200,000 - - - - Generation n => 50000 x 2(n-1) = 166666666 n -1 = log2(166666666/5000) = 11.7 n ~= 13 generation The largest generation number is 13 and the 12 generations will be merged to flush
[ Next ]
X