2019-10-29

Oct 30 In-Class Exercise Thread.

Post your solutions to the Oct. 30 In-Class Exercise to this thread.
Best,
Chris
Post your solutions to the Oct. 30 In-Class Exercise to this thread. Best, Chris
2019-10-30

-- Oct 30 In-Class Exercise Thread
Resource Description for new doc 2019-10-30 14.04.19_1.jpg
((resource:new doc 2019-10-30 14.04.19_1.jpg|Resource Description for new doc 2019-10-30 14.04.19_1.jpg))

-- Oct 30 In-Class Exercise Thread
 
 1000000000/(50000*6) = 3333.33
 numberOfGenerations = log2(3333.33) ~= 12
 numOfMerges ~= 11
(Edited: 2019-10-30)
1000000000/(50000*6) = 3333.33 numberOfGenerations = log2(3333.33) ~= 12 numOfMerges ~= 11

-- Oct 30 In-Class Exercise Thread
 Each machine is responsible for 1 billion by 6 documents (10^9/6 documents)
 which corresponds to approximately 167 million documents per machine
 When 50,000 documents are filled in the memory we create a generation and repeat this every time there are 2 blocks in the same generation.
 so we write 50k, 100k, 200k, 400k, 800k, 1.6M, 3.2M, 6.4M, 12.8M, 25.6M, 50.12M, 100.24M and we stop there 
 This gives us 13 generations in total and 12 total merges to get the final index. 
(Edited: 2019-10-30)
Each machine is responsible for 1 billion by 6 documents (10^9/6 documents) which corresponds to approximately 167 million documents per machine When 50,000 documents are filled in the memory we create a generation and repeat this every time there are 2 blocks in the same generation. so we write 50k, 100k, 200k, 400k, 800k, 1.6M, 3.2M, 6.4M, 12.8M, 25.6M, 50.12M, 100.24M and we stop there This gives us 13 generations in total and 12 total merges to get the final index.

-- Oct 30 In-Class Exercise Thread
For each machine:
 Total documents responsible = (1*10^9)/6 = 1.67 * 10^8 
 Generation  Documents
   G0      50000 = 50000 * 2^0
   G1      50000 = 50000 * 2^0
   G2     100000 = 50000 * 2^1
   G3     200000 = 50000 * 2^2
   G4     400000 = 50000 * 2^3
      ...................
   Gn = 1.67*10^8 = 50000* 3340 = 50000* 2^(n-1)
   n-1= log(3340)~= 12
   n =13
So
 1)Largest generation number that will be used is 13
 2)Most generations that will need to be merged = 13-1 = 12      
(Edited: 2019-10-30)
For each machine: Total documents responsible = (1*10^9)/6 = 1.67 * 10^8 Generation Documents G0 50000 = 50000 * 2^0 G1 50000 = 50000 * 2^0 G2 100000 = 50000 * 2^1 G3 200000 = 50000 * 2^2 G4 400000 = 50000 * 2^3 ................... Gn = 1.67*10^8 = 50000* 3340 = 50000* 2^(n-1) n-1= log(3340)~= 12 n =13 So 1)Largest generation number that will be used is 13 2)Most generations that will need to be merged = 13-1 = 12

-- Oct 30 In-Class Exercise Thread
Generation 0 ==> 50K Generation 1 ==> 50K Generation 2 ==> 100K Generation 3 ==> 200k
Hence for storing 400 K we will require log 4 = 2 generations
Hence for 1 billion we will require log(10000) = n= 13 Generations
When we merge it in memory, n-1 merges = 12 merges
Generation 0 ==> 50K Generation 1 ==> 50K Generation 2 ==> 100K Generation 3 ==> 200k Hence for storing 400 K we will require log 4 = 2 generations Hence for 1 billion we will require log(10000) = n= 13 Generations When we merge it in memory, n-1 merges = 12 merges

-- Oct 30 In-Class Exercise Thread
For storing 400k : (log 4) = 2 generations Therefore, for 1 billion: log(10000) = 13 generations (n) Thus, on merging it in the memory, we will have n-1 merges = 12 merges
For storing 400k : (log 4) = 2 generations Therefore, for 1 billion: log(10000) = 13 generations (n) Thus, on merging it in the memory, we will have n-1 merges = 12 merges

-- Oct 30 In-Class Exercise Thread
Documents in total = 1000000000
We have six machines, so distribution = total documents/6 = 167 Million
To get the number of generations,
Start with generation 0 = 50k, 50k*1 = 50k
generation 1 = 50k*2 = 100k
generation 2 = 50k*4 = 400k
generation 3 = 50k*8 = 800k
Which is multiple of 2^n
e.g. for generation 3, n=3 and so on
Similarly, for 167M,
167M = 50K*3340,
Hence, log(3340) = 11.7 ~ 12 which is n-1
Largest generation : n=13
Number of merges : 12
(Edited: 2019-10-30)
Documents in total = 1000000000 We have six machines, so distribution = total documents/6 = 167 Million To get the number of generations, Start with generation 0 = 50k, 50k*1 = 50k generation 1 = 50k*2 = 100k generation 2 = 50k*4 = 400k generation 3 = 50k*8 = 800k Which is multiple of 2^n e.g. for generation 3, n=3 and so on Similarly, for 167M, 167M = 50K*3340, Hence, log(3340) = 11.7 ~ 12 which is n-1 Largest generation : n=13 Number of merges : 12

-- Oct 30 In-Class Exercise Thread
1) For storing 400k: (log 4) = 2 generations, therefore, for 1 billion: log(10000) = 13 generations (n)
2) Thus, on merging it in the memory, we will have (n-1) merges = 12 merges
1) For storing 400k: (log 4) = 2 generations, therefore, for 1 billion: log(10000) = 13 generations (n) 2) Thus, on merging it in the memory, we will have (n-1) merges = 12 merges
2019-11-03

-- Oct 30 In-Class Exercise Thread
 We have to index 1 billion pages
 Total no. of machines = 6
 Each machine can index 50000 pages before running out of memory.
 Each machine will be responsible for (1 billion/6) pages = 166666666 pages
 Generation 1 => 50,000 x 2o = 50,000
 Generation 2 => 50,000 x 21 = 100,000
 Generation 3 => 50,000 x 22 = 200,000
 -
 -
 -
 -
 Generation n => 50000 x 2(n-1) = 166666666
 n -1 = log2(166666666/5000) = 11.7
 n ~= 13 generation
The largest generation number is 13 and the 12 generations will be merged to flush
(Edited: 2019-11-03)
We have to index 1 billion pages Total no. of machines = 6 Each machine can index 50000 pages before running out of memory. Each machine will be responsible for (1 billion/6) pages = 166666666 pages Generation 1 => 50,000 x 2o = 50,000 Generation 2 => 50,000 x 21 = 100,000 Generation 3 => 50,000 x 22 = 200,000 - - - - Generation n => 50000 x 2(n-1) = 166666666 n -1 = log2(166666666/5000) = 11.7 n ~= 13 generation The largest generation number is 13 and the 12 generations will be merged to flush
[ Next ]
X