-- CS267 Fall 2019Practice Midterm 1
Problem 10:
Briefly explain and give example of the following (1pt each): (a) sort-based dictionary, (b) per-term index, (c) move-to-front heuristic, (d) merge based index construction.
(a) sort-based dictionary:
- all terms that appear in collection are stored in sorted array or in lexicographical order.
+---------+----------+----------+------------+------------+------------+--------------+
| (aaa,0) | (aab, 4) | (aac, 8) | (term, 12) | (wait, 16) | (waze, 20) | (xerox, 24) |
+---------+----------+----------+------------+------------+------------+--------------+
(b) per-term index
- in order to carry out efficient random access operations on the postings lists,
each list is equipped with an auxiliary data structure which is stored on disk,
at the beginning of respective posting list.
+-----+-----+-----+-----+
| TF: | 1 | 51 | 101 |
+-----+-----+-----+-----+
| 1 | 34 | 43 | 48 |
+-----+-----+-----+-----+
| 51 | 69 | 72 | 89 |
+-----+-----+-----+-----+
| 101 | 112 | 123 | 134 |
+-----+-----+-----+-----+
(c) move-to-front heuristic
- If a term is frequent in the text collection, then it should be at the beginning of its chain in the hash table.
- Like insert-at-back, the move-to-front heuristic inserts new terms at the end of their respective chain.
- In addition, however, whenever a term lookup occurs and the term descriptor is not found at the beginning of the chain, it is relocated and moved to the front.
- This way, when the next lookup for the term takes place, the term’s dictionary entry is still at (or near) the beginning of its chain in the hash table.
(d) merge based index construction
- Merge-based indexing is a generalization of the in-memory index construction method, building inverted lists by means of hash table lookup.
Sushant Mane | Ashutosh Kale
(
Edited: 2019-10-04)
Problem 10:
Briefly explain and give example of the following (1pt each): (a) sort-based dictionary, (b) per-term index, (c) move-to-front heuristic, (d) merge based index construction.
(a) sort-based dictionary:
- all terms that appear in collection are stored in sorted array or in lexicographical order.
+---------+----------+----------+------------+------------+------------+--------------+
| (aaa,0) | (aab, 4) | (aac, 8) | (term, 12) | (wait, 16) | (waze, 20) | (xerox, 24) |
+---------+----------+----------+------------+------------+------------+--------------+
(b) per-term index
- in order to carry out efficient random access operations on the postings lists,
each list is equipped with an auxiliary data structure which is stored on disk,
at the beginning of respective posting list.
+-----+-----+-----+-----+
| TF: | 1 | 51 | 101 |
+-----+-----+-----+-----+
| 1 | 34 | 43 | 48 |
+-----+-----+-----+-----+
| 51 | 69 | 72 | 89 |
+-----+-----+-----+-----+
| 101 | 112 | 123 | 134 |
+-----+-----+-----+-----+
(c) move-to-front heuristic
- If a term is frequent in the text collection, then it should be at the beginning of its chain in the hash table.
- Like insert-at-back, the move-to-front heuristic inserts new terms at the end of their respective chain.
- In addition, however, whenever a term lookup occurs and the term descriptor is not found at the beginning of the chain, it is relocated and moved to the front.
- This way, when the next lookup for the term takes place, the term’s dictionary entry is still at (or near) the beginning of its chain in the hash table.
(d) merge based index construction
- Merge-based indexing is a generalization of the in-memory index construction method, building inverted lists by means of hash table lookup.
Sushant Mane | Ashutosh Kale