2018-10-24

Hw4 is up!.

Hey Everyone,
Hw4 is up!
Best, Chris
Hey Everyone, Hw4 is up! Best, Chris
2018-10-26

-- Hw4 is up!
Could you provide some test input and output for HW4?
The book doesn't provide concrete examples of computing the bm25 score for a document like it did for cosine and proximity ranking. Having some test output to compare our program against would be very helpful.
(Edited: 2018-10-26)
Could you provide some test input and output for HW4? The book doesn't provide concrete examples of computing the bm25 score for a document like it did for cosine and proximity ranking. Having some test output to compare our program against would be very helpful.
2018-10-28

-- Hw4 is up!
Do we need to calculate BM25 using disjunctive or conjunctive ? Cosine and Proximity earlier were implemented for conjunctive ?
Is it okay to calculate length of document by the number of terms contained in it?
Do we need to calculate BM25 using disjunctive or conjunctive ? Cosine and Proximity earlier were implemented for conjunctive ? Is it okay to calculate length of document by the number of terms contained in it?
2018-10-31

-- Hw4 is up!
@ritigupta07 (1) BM25 should be computed disjunctively. (2) yes.
@sshahab the trec_eval software and the slides have some example output data (in terms of format you need to recreate). For inputs, you could try the wikipedia article:
https://en.wikipedia.org/wiki/L%27Anse_aux_Meadows
and use the first 10 paragraphs as your corpus. You can use the two queries "Viking Colony" and "Norse Sagas".
@ritigupta07 (1) BM25 should be computed disjunctively. (2) yes. @sshahab the trec_eval software and the slides have some example output data (in terms of format you need to recreate). For inputs, you could try the wikipedia article: https://en.wikipedia.org/wiki/L%27Anse_aux_Meadows and use the first 10 paragraphs as your corpus. You can use the two queries "Viking Colony" and "Norse Sagas".
2018-11-02

-- Hw4 is up!
I have some doubts in understanding the algorithm for term at a time with pruning algorithm. What does the line 32 in the algorithm mean?
T := argmin_x{x in Nat|
                         sum_(j=1)^x(tfStats[j] * q) ≥ quotaLeft}
As per my understanding it means vtf = mininum for all x { s1,s2,s3,s4.....} where the sum values s1,s2,s3,s4 should be greater than quotaLeft
But what does minimum mean here, would the summation values not keep increasing with x. The summation would be minimum for x = 1.
What am I missing here?
(Edited: 2018-11-02)
I have some doubts in understanding the algorithm for term at a time with pruning algorithm. What does the line 32 in the algorithm mean? T := argmin_x{x in Nat| sum_(j=1)^x(tfStats[j] * q) ≥ quotaLeft} As per my understanding it means vtf = mininum for all x { s1,s2,s3,s4.....} where the sum values s1,s2,s3,s4 should be greater than quotaLeft But what does minimum mean here, would the summation values not keep increasing with x. The summation would be minimum for x = 1. What am I missing here?
2018-11-05

-- Hw4 is up!
argmin in this case is returning the smallest x such that sum_{j=1}^x tfStats[j]*q >= quotaLeft. You aren’t trying to minimize the sum, you are considering sums of varying numbers of terms and trying to find the least number x of terms it takes for the value to be bigger than quotaLeft. Let me know if this helps.
argmin in this case is returning the smallest x such that sum_{j=1}^x tfStats[j]*q >= quotaLeft. You aren’t trying to minimize the sum, you are considering sums of varying numbers of terms and trying to find the least number x of terms it takes for the value to be bigger than quotaLeft. Let me know if this helps.
X