2022-04-20

Apr 20 In-Class Exercise.

Please post your solutions to the Apr 20 In-Class Exercise to this thread.
Best,
Chris
Please post your solutions to the Apr 20 In-Class Exercise to this thread. Best, Chris

-- Apr 20 In-Class Exercise
Resource Description for Screen Shot 2022-04-20 at 2.43.20 PM.png
The output is: 10011100010

Explanation:
index need to handle per machine:
10^9 / 4 = 2.5 * 10^8

number of generation_1
2.5 * 10^8 / 200,000 = 1250

let g be the maximum number of generations we need:
2^g = 1250 => g = lower(log_2(1250)) = 10

The largest generation # is:
10

The most generations that will need to be merged when we have to flush the in-memory partition to disk are:
9
(Edited: 2022-04-20)
((resource:Screen Shot 2022-04-20 at 2.43.20 PM.png|Resource Description for Screen Shot 2022-04-20 at 2.43.20 PM.png)) The output is: 10011100010<br/><br/> Explanation: <br/> index need to handle per machine: <br/> 10^9 / 4 = 2.5 * 10^8 <br/> <br/> number of generation_1 <br/> 2.5 * 10^8 / 200,000 = 1250 <br/> <br/> let g be the maximum number of generations we need: <br/> 2^g = 1250 => g = lower(log_2(1250)) = 10 <br/><br/> The largest generation # is: <br/> 10 <br/><br/> The most generations that will need to be merged when we have to flush the in-memory partition to disk are:<br/> 9

-- Apr 20 In-Class Exercise
Given: 
 M = 200000
 N(total) = 1000000000
 N(per machine) = N(total)/4 = 250000000
 
Logarithmic merging:
 Gen 0 : 200000 docs
 Gen 1 : 400000 docs
 .
 .
 .
Gen n : 2^t * 200000 docs 
 
Now, we know that our total num of docs = 250000000 = 2^t * 200000
> 2^t  1275
> t  int(log2(1275)) = 10 
 
Thus, largest num of gen = 10
Max. num of gens to be merged = 9
<pre> Given: M = 200000 N(total) = 1000000000 N(per machine) = N(total)/4 = 250000000 Logarithmic merging: Gen 0 : 200000 docs Gen 1 : 400000 docs . . . Gen n : 2^t * 200000 docs Now, we know that our total num of docs = 250000000 = 2^t * 200000 => 2^t = 1275 => t = int(log2(1275)) = 10 Thus, largest num of gen = 10 Max. num of gens to be merged = 9 </pre>

-- Apr 20 In-Class Exercise
total documents-  2B
Machines count- 4 
 
N= 2B/4= 250M
M=200000  
 
1B / 4 = 250 M
2^t *2 = 2500
t log(1275) = 11 
 
A) latest generation is 10
B) 9 merges required
(Edited: 2022-04-20)
<pre> total documents- 2B Machines count- 4 N= 2B/4= 250M M=200000 1B / 4 = 250 M 2^t *2 = 2500 t log(1275) = 11 A) latest generation is 10 B) 9 merges required </pre>

-- Apr 20 In-Class Exercise
n = 250000000
max_gen = 0
a = 2
mem = 200000 
 
while n > 0:
    n -= mem * a
    a *= 2
    max_gen += 1 
 
print(max_gen) 
 
This generates values as 10
<pre> n = 250000000 max_gen = 0 a = 2 mem = 200000 while n > 0: n -= mem * a a *= 2 max_gen += 1 print(max_gen) This generates values as 10 </pre>

-- Apr 20 In-Class Exercise
 In m/y limit = 200000, total docs = 1000000000, docs per machine = 250000000
gen t = 2^t * 200000 , t = log_2 (250000000 / 200000) = 10
Largest generation = 10, Most generations that will need to be merged = 9  
(Edited: 2022-04-20)
 In m/y limit = 200000, total docs = 1000000000, docs per machine = 250000000 gen t = 2^t * 200000 , t = log_2 (250000000 / 200000) = 10 Largest generation = 10, Most generations that will need to be merged = 9  

-- Apr 20 In-Class Exercise
1 billion/4 machines = 250 million
250 million/200000 documents = 1250
log(1250) = 10.28 ~ 10 generations
The largest generation is 10.
Max number of generations merged is 9
(Edited: 2022-04-20)
1 billion/4 machines = 250 million 250 million/200000 documents = 1250 log(1250) = 10.28 ~ 10 generations The largest generation is 10. Max number of generations merged is 9
2022-04-21

-- Apr 20 In-Class Exercise
Total pages = 1 billion
each machine = 250 million
Resource Description for Screenshot 2022-04-21 at 4.06.02 PM.png
Total number of generations = 10 Maximum number of generations to be merged = 9
Total pages = 1 billion each machine = 250 million ((resource:Screenshot 2022-04-21 at 4.06.02 PM.png|Resource Description for Screenshot 2022-04-21 at 4.06.02 PM.png)) Total number of generations = 10 Maximum number of generations to be merged = 9

-- Apr 20 In-Class Exercise
Total number of documents -  1 billion
Machines count- 4 
Memory limit - 200000 documents
Number of pages each machine will process =1 billion/4 machines = 250 million
t = log_2 (250000000 / 200000) = log2(1250) =  10 generations
The largest generation is 10.
Max number of generations merged is 9
<pre>Total number of documents - 1 billion Machines count- 4 Memory limit - 200000 documents Number of pages each machine will process =1 billion/4 machines = 250 million t = log_2 (250000000 / 200000) = log2(1250) = 10 generations The largest generation is 10. Max number of generations merged is 9 </pre>
2022-04-23

-- Apr 20 In-Class Exercise
Total docs - 1B
Memory limit - 200K
Num of Machines = 4
Num of documents to be processed by each machine = 250M
log(250M/200K) = log(1250) ~ 10
Largest generation = 10
Most generations that will need to be merged = 9
Total docs - 1B Memory limit - 200K Num of Machines = 4 Num of documents to be processed by each machine = 250M log(250M/200K) = log(1250) ~ 10 Largest generation = 10 Most generations that will need to be merged = 9
[ Next ]
X