-- CS267 Fall 2019 Practice Final
Question 10
Members: Prathamesh Kumkar, Ashutosh Kale, Rahul Shulka
Batch filtering:
Filtering is the process of evaluating documents on an ongoing basis according to some standing information need.
For example, a news filter might deliver articles on mental health to a health-care professional.
Sometimes, for a filtering task, the documents to be searched do not yet exist.
If it is possible to wait for a large number of documents to arrive, we can accumulate them into a corpus, index them, and apply a search method such as BM25 to yield a ranked list.
We can then repeat the process for each new batch of documents in turn.
This approach is known as batch filtering.
From the user’s perspective, batch filtering yields a viable solution if enough documents arrive within a suitable time interval.
Aggregate P@k:
One drawback with size-specific precision at P@k, is that the number of relevant documents can vary between batches, but we are always using the same value for k.
Instead, we can use the fact that BM25 orders documents ranked by a relevance score s and, in a given batch, present to the user documents which have s > t for some threshold t.
We choose t so that k = ρN documents will have score s > t.
The results of each batch are entered successively into a common priority queue.
Thus, Aggregate P@k is the precision of the top k elements of the queue once all batches are processed.
Question 10
Members: Prathamesh Kumkar, Ashutosh Kale, Rahul Shulka
Batch filtering:
Filtering is the process of evaluating documents on an ongoing basis according to some standing information need.
For example, a news filter might deliver articles on mental health to a health-care professional.
Sometimes, for a filtering task, the documents to be searched do not yet exist.
If it is possible to wait for a large number of documents to arrive, we can accumulate them into a corpus, index them, and apply a search method such as BM25 to yield a ranked list.
We can then repeat the process for each new batch of documents in turn.
This approach is known as batch filtering.
From the user’s perspective, batch filtering yields a viable solution if enough documents arrive within a suitable time interval.
Aggregate P@k:
One drawback with size-specific precision at P@k, is that the number of relevant documents can vary between batches, but we are always using the same value for k.
Instead, we can use the fact that BM25 orders documents ranked by a relevance score s and, in a given batch, present to the user documents which have s > t for some threshold t.
We choose t so that k = ρN documents will have score s > t.
The results of each batch are entered successively into a common priority queue.
Thus, Aggregate P@k is the precision of the top k elements of the queue once all batches are processed.