-- Practice Final Up!
Team: Nishant Goel and Swapnil Patil
Question 5:
In REBUILD, a new index is built from scratch. When this process is finished,
the old index is deleted and replaced with the new one.
The time for rebuild is given by,
c.d(new) = c.( d(old) – d(delete) + d(insert))
where:
c = system specific constant
d(new) = new documents
d(old) = old documents
d(delete) = deleted documents
d(insert) = inserted documents
In REMERGE, an index is built from the text found in the newly inserted documents.
After this index has been created, it is merged with the previously existing
index.
The time for remerge is given by,
c.d(new) = c.d(insert) + c/4. (d(old)+d(insert))
Now, consider the following example.
Let d(old)=100, d(delete)= 70, and d(insert)=20
Using equation for rebuild,
c. d(new) = c.(100-70+20) = 50.c
Similarly for remerge,
c. d(new) = c.20 + c/4.(100+20) = 50.c
Thus, we can see that both rebuild and remerge have the sane performance for the
above mentioned example.
Assumption: It does not take into consideration the REMERGE overhead to garbage
collect posting that refer to the deleted documents.
(
Edited: 2016-05-16)
Team: Nishant Goel and Swapnil Patil
Question 5:
In REBUILD, a new index is built from scratch. When this process is finished,
the old index is deleted and replaced with the new one.
The time for rebuild is given by,
c.d(new) = c.( d(old) – d(delete) + d(insert))
where:
c = system specific constant
d(new) = new documents
d(old) = old documents
d(delete) = deleted documents
d(insert) = inserted documents
In REMERGE, an index is built from the text found in the newly inserted documents.
After this index has been created, it is merged with the previously existing
index.
The time for remerge is given by,
c.d(new) = c.d(insert) + c/4. (d(old)+d(insert))
Now, consider the following example.
Let d(old)=100, d(delete)= 70, and d(insert)=20
Using equation for rebuild,
c. d(new) = c.(100-70+20) = 50.c
Similarly for remerge,
c. d(new) = c.20 + c/4.(100+20) = 50.c
Thus, we can see that both rebuild and remerge have the sane performance for the
above mentioned example.
Assumption: It does not take into consideration the REMERGE overhead to garbage
collect posting that refer to the deleted documents.