[ Prev ]
2021-04-25

-- Apr 21 In-Class Exercise Thread
Total Number of documents is 10^9 and only 1/6th of 10^9 can saved on each machine. N = 10^9/6 M = is the available memory = 10^5 log(N/M) = log (10^4/6) = log(1666.7) = 10.7 or 11 Largest number of merges = 11 but the merging will be done by 10 generations
Total Number of documents is 10^9 and only 1/6th of 10^9 can saved on each machine. N = 10^9/6 M = is the available memory = 10^5 log(N/M) = log (10^4/6) = log(1666.7) = 10.7 or 11 Largest number of merges = 11 but the merging will be done by 10 generations

-- Apr 21 In-Class Exercise Thread
1B pages / 6 machines = 167M pages per machine
167M pages / 100k memory = 1667 merges per machine

Each generation will be half the size of the next generation => log behavior
log_2(1667) = 10.7 => 11 generations

The generations represent a binary number. The most number of merges is equivalent to the most number of carries.
In this case that is 11-1 = 10 merges
(Edited: 2021-04-25)
1B pages / 6 machines = 167M pages per machine <br> 167M pages / 100k memory = 1667 merges per machine <br><br> Each generation will be half the size of the next generation => log behavior <br> log_2(1667) = 10.7 => 11 generations <br><br> The generations represent a binary number. The most number of merges is equivalent to the most number of carries. <br> In this case that is 11-1 = 10 merges

-- Apr 21 In-Class Exercise Thread
Number of Documents per Machine = 1 billion pages / 6 machines = 167 million pages/machine Number of Merges per Machine = 167 million pages / 100000 memory = 1667 merges Largest generation number = log₂(1667) = 10.7 ≈ 11 generations Most generations needed for merging = 11 - 1 = 10 generations
<nowiki> Number of Documents per Machine = 1 billion pages / 6 machines = 167 million pages/machine Number of Merges per Machine = 167 million pages / 100000 memory = 1667 merges Largest generation number = log₂(1667) = 10.7 ≈ 11 generations Most generations needed for merging = 11 - 1 = 10 generations </nowiki>
X