-- Oct 30 In-Class Exercise Thread
Each machine is responsible for 1 billion by 6 documents (10^9/6 documents)
which corresponds to approximately 167 million documents per machine
When 50,000 documents are filled in the memory we create a generation and repeat this every time there are 2 blocks in the same generation.
so we write 50k, 100k, 200k, 400k, 800k, 1.6M, 3.2M, 6.4M, 12.8M, 25.6M, 50.12M, 100.24M and we stop there
This gives us 13 generations in total and 12 total merges to get the final index.
(
Edited: 2019-10-30)
Each machine is responsible for 1 billion by 6 documents (10^9/6 documents)
which corresponds to approximately 167 million documents per machine
When 50,000 documents are filled in the memory we create a generation and repeat this every time there are 2 blocks in the same generation.
so we write 50k, 100k, 200k, 400k, 800k, 1.6M, 3.2M, 6.4M, 12.8M, 25.6M, 50.12M, 100.24M and we stop there
This gives us 13 generations in total and 12 total merges to get the final index.