2021-04-20

Apr 21 In-Class Exercise Thread.

Please post your solutions to the Apr 21 In-Class Exercise to this thread.
Best,
Chris
Please post your solutions to the Apr 21 In-Class Exercise to this thread. Best, Chris
2021-04-21

-- Apr 21 In-Class Exercise Thread
1 billion pages / 6 machines = 167million pages per machine 167 million pages per machine / 100,000 pages per merge = 1667 merges per machine log(1667) = 10.7 = 11 generations will merge 10 generations
1 billion pages / 6 machines = 167million pages per machine 167 million pages per machine / 100,000 pages per merge = 1667 merges per machine log(1667) = 10.7 = 11 generations will merge 10 generations

-- Apr 21 In-Class Exercise Thread
1 billion pages / 6 machines = 167 million pages per machine 167 million pages / 100,000 max documents in memory = 1667 merges log_2(1667) = 11 generations 11 generations is the largest generation number The most generations that will need to be merged is 10. Since each previous generation merges to make a new generation.
<nowiki> 1 billion pages / 6 machines = 167 million pages per machine 167 million pages / 100,000 max documents in memory = 1667 merges log_2(1667) = 11 generations 11 generations is the largest generation number The most generations that will need to be merged is 10. Since each previous generation merges to make a new generation. </nowiki>

-- Apr 21 In-Class Exercise Thread
1 billion pages / 6 machines => 167 million pages per machine
167 million pages / 100,000 docs per merge => 1667 merges
log(1667) => 11 generations (will merged by 10 generations)
(Edited: 2021-04-21)
1 billion pages / 6 machines => 167 million pages per machine 167 million pages / 100,000 docs per merge => 1667 merges log(1667) => 11 generations (will merged by 10 generations)

-- Apr 21 In-Class Exercise Thread
N = 10^9/6 [since each machine takes care of 1/6 of the docs]
M = 10^5 [memory size]
Number of generations = log(N/M) = log(1666.7) = 10.7 or 11
Largest Generation used = 10 or 51,200,000 documents
Number of merges to flush the in memory partition to disk = 10
(Edited: 2021-04-21)
N = 10^9/6 [since each machine takes care of 1/6 of the docs] M = 10^5 [memory size] Number of generations = log(N/M) = log(1666.7) = 10.7 or 11 Largest Generation used = 10 or 51,200,000 documents Number of merges to flush the in memory partition to disk = 10

-- Apr 21 In-Class Exercise Thread
	10^9 / 10^5 = 10,000 ==> 10000 / 6 = 1666.66666667

That means that 4 of the 6 machines will be written to 1667 times and the remaining 2 will be written to 1666 times.

Let's assume that we write to each machine sequentially: first to Machine 1, then to Machine 2, .... Machine 6, then Machine 1 again.

Let x be the highest generation number in one of the 4 machines to which we write 1667 times:
	2^x = 1667
x = log (1667)
x = 11

To get this gen-1` file, we will need to merge two gen-10 files. That means we need to have one of each generational
file from 1 all the way to 10 in order to make gen-11; the gen-1 to gen-9 files will be used to make the gen-10 that
merges with the existing gen-10 to make the gen-11.

So, one merge for each of the 10 generations means at most 10 generations will need to be merged.
(Edited: 2021-04-21)
10^9 / 10^5 = 10,000 ==> 10000 / 6 = 1666.66666667<br><br> That means that 4 of the 6 machines will be written to 1667 times and the remaining 2 will be written to 1666 times.<br><br> Let's assume that we write to each machine sequentially: first to Machine 1, then to Machine 2, .... Machine 6, then Machine 1 again.<br><br> Let x be the highest generation number in one of the 4 machines to which we write 1667 times:<br> 2^x = 1667<br> x = log (1667)<br> x = 11<br><br> To get this gen-1@BT@ file, we will need to merge two gen-10 files. That means we need to have one of each generational <br> file from 1 all the way to 10 in order to make gen-11; the gen-1 to gen-9 files will be used to make the gen-10 that <br> merges with the existing gen-10 to make the gen-11.<br><br> So, one merge for each of the 10 generations means at most 10 generations will need to be merged.<br>

-- Apr 21 In-Class Exercise Thread
With each machine getting 1/6 of 10^9, the max any machine will have to process will be 166,666,667 documents, which will be 1666 writes with some left in memory. The generations follow a log2 structure, with each doubling of writes corresponding to a new generation, so log2(1666) = 10.70217268536555, or 11 generations on disk and 1 in memory, for 12 generations. The max merges will be 10 , for when we have 1 2 3 4 5 6 7 8 9 10 and add another, making
1 1 2 3 4 5 6 7 8 9 10 -> 2 2 3 4 5 6 7 8 9 10
2 2 3 4 5 6 7 8 9 10 -> 3 3 4 5 6 7 8 9 10
3 3 4 5 6 7 8 9 10 -> 4 4 5 6 7 8 9 10
4 4 5 6 7 8 9 10 -> 5 5 6 7 8 9 10
5 5 6 7 8 9 10 -> 6 6 7 8 9 10
6 6 7 8 9 10 -> 7 7 8 9 10
7 7 8 9 10 -> 8 8 9 10
8 8 9 10 -> 9 9 10
9 9 10 -> 10 10
10 10 -> 11
(Edited: 2021-04-21)
With each machine getting 1/6 of 10^9, the max any machine will have to process will be 166,666,667 documents, which will be 1666 writes with some left in memory. The generations follow a log2 structure, with each doubling of writes corresponding to a new generation, so log2(1666) = 10.70217268536555, or 11 generations on disk and 1 in memory, for 12 generations. The max merges will be 10 , for when we have 1 2 3 4 5 6 7 8 9 10 and add another, making 1 1 2 3 4 5 6 7 8 9 10 -> 2 2 3 4 5 6 7 8 9 10 2 2 3 4 5 6 7 8 9 10 -> 3 3 4 5 6 7 8 9 10 3 3 4 5 6 7 8 9 10 -> 4 4 5 6 7 8 9 10 4 4 5 6 7 8 9 10 -> 5 5 6 7 8 9 10 5 5 6 7 8 9 10 -> 6 6 7 8 9 10 6 6 7 8 9 10 -> 7 7 8 9 10 7 7 8 9 10 -> 8 8 9 10 8 8 9 10 -> 9 9 10 9 9 10 -> 10 10 10 10 -> 11

-- Apr 21 In-Class Exercise Thread
1000000000 pages / 6 machines = 166666667 pages per machine
166666667 pages / 100000 documents max = 1667 merges per machine
log2(1667) = 10.7 -> 11 is the largest generation number used
11 - 1 = 10 is most generations that need to be merged.
1000000000 pages / 6 machines = 166666667 pages per machine 166666667 pages / 100000 documents max = 1667 merges per machine log2(1667) = 10.7 -> 11 is the largest generation number used 11 - 1 = 10 is most generations that need to be merged.
2021-04-25

-- Apr 21 In-Class Exercise Thread
We have a total of 10^9 documents and there are 6 machines. Each machine gets 10^9 / 6 documents.
  
But we can only index 10^5 at a time.
  
Therefore, the number of writes per machine = 10^9 / 6 / 10^5 ~= 1667
  
Initially, all partitions will have generation 1. As merges happen, it will reduce the total number of partitions by a factor of 2. Hence it follows a logarithmic pattern.
  
The highest generation that will reach is log_2(1667) ~= 11
  
The number of generations that will be merged = 11 - 1 = 10.
   
(Edited: 2021-04-25)
We have a total of 10^9 documents and there are 6 machines. Each machine gets 10^9 / 6 documents. But we can only index 10^5 at a time. Therefore, the number of writes per machine = 10^9 / 6 / 10^5 ~= 1667 Initially, all partitions will have generation 1. As merges happen, it will reduce the total number of partitions by a factor of 2. Hence it follows a logarithmic pattern. The highest generation that will reach is log_2(1667) ~= 11 The number of generations that will be merged = 11 - 1 = 10.

-- Apr 21 In-Class Exercise Thread
 billion pages / 6 machines = 166666667 pages per machine
 166666667 pages / 100000 documents
 So, max 1667 merges per machines
 log2(1667) = 10.7 
 So, 11 is the largest generation number
 11-1  = 10 is most generations that need to be merged
billion pages / 6 machines = 166666667 pages per machine 166666667 pages / 100000 documents So, max 1667 merges per machines log2(1667) = 10.7 So, 11 is the largest generation number 11-1 = 10 is most generations that need to be merged
[ Next ]
X