[ Prev ]
2021-05-17

-- Practice Final Thread
Team - 4: Vrinda and William
Q10. What is Batch filtering? What is aggregate P@k?
Batch filtering is a filtering process where we accumulate new documents into a corpus, index them, rank them, then send them to the appropriate user. Essentially, we process documents in groups instead of as they come in.
Aggregate P@k is a variation of P@k that uses the scoring function to set a score threshold for documents within each batch to return relevant documents overall batches, bypassing the limitation that the number of relevant documents in each batch can vary.
 
Team - 4: Vrinda and William Q10. What is Batch filtering? What is aggregate P@k? Batch filtering is a filtering process where we accumulate new documents into a corpus, index them, rank them, then send them to the appropriate user. Essentially, we process documents in groups instead of as they come in. Aggregate P@k is a variation of P@k that uses the scoring function to set a score threshold for documents within each batch to return relevant documents overall batches, bypassing the limitation that the number of relevant documents in each batch can vary.

-- Practice Final Thread
Mustafa and Pranjali
In VByte there Out of 8 bits 1 bit is used to indicate if there is going to be a next byte representing a number or not. The remaining 7 bits are used for representing the data itself. VByte encoding example,
So if want to encode 255 in vbyte:
255 requires 8 bits to represent. So, we will need 2 bytes to represent 255. The representaion looks like below. 10000001 01111111
Explanation for 10000001:
The first 1 is to indicate that there is one more byte needed for representing 255 The last 1 is the MSB of binary representation of 255.
Explanation for 01111111: The first 0 is to indicate that there are no more bytes required to represent 255. The remaining 1s are the remaining 7 bits are 7 bits from binary representation of 255.
Non-Parametric coding technique example:
i. Gamma codes In Gamma codes there are 2 parts of a code - The selector and the body Selector: Contains the number of bits required for the binary representation of number.It is written in unary. Body: Is the binary representation of the number that is being encoded. Example: If we want to encode 5 Then the selector will be 001, since 5 can be represented in 3 bits as 101. The body will be the binary encoding of 5, that is 101
001 101 We drop the 1 from the selector part and join the selector and body: So the final code word is : 00101
(Edited: 2021-05-17)
Mustafa and Pranjali In VByte there Out of 8 bits 1 bit is used to indicate if there is going to be a next byte representing a number or not. The remaining 7 bits are used for representing the data itself. VByte encoding example, So if want to encode 255 in vbyte: 255 requires 8 bits to represent. So, we will need 2 bytes to represent 255. The representaion looks like below. 10000001 01111111 Explanation for 10000001: The first 1 is to indicate that there is one more byte needed for representing 255 The last 1 is the MSB of binary representation of 255. Explanation for 01111111: The first 0 is to indicate that there are no more bytes required to represent 255. The remaining 1s are the remaining 7 bits are 7 bits from binary representation of 255. Non-Parametric coding technique example: i. Gamma codes In Gamma codes there are 2 parts of a code - The selector and the body Selector: Contains the number of bits required for the binary representation of number.It is written in unary. Body: Is the binary representation of the number that is being encoded. Example: If we want to encode 5 Then the selector will be 001, since 5 can be represented in 3 bits as 101. The body will be the binary encoding of 5, that is 101 001 101 We drop the 1 from the selector part and join the selector and body: So the final code word is : 00101

-- Practice Final Thread
Question 2. Suppose we have m=min(5, length of your surname) many accumulators. Give examples that would trigger the QUIT and the CONTINUE accumulator pruning heuristics if term at a time processing was being used. The conditions to trigger CONTINUE and QUIT are the number of accumulators should be equal to or greater than amax. We have 8 documents 1 a dog 2 a cat 3 the dog 4 the cat 5 a good dog 6 a bad cat 7 a very good dog 8 a very bad cat We do term-at-a-time for 'a' 'a' is present in the docs 1, 2, 5, 6, 7, 8 amax is 5 (min(5, 16)) The accumulators: [(1, s1)] [(1, s1), (2, s2)] [(1, s1), (2, s2), (5, s5)] [(1, s1), (2, s2), (5, s5), (6, s6)] [(1, s1), (2, s2), (5, s5), (6, s6), (7, s7)] Since the number of accumulators is equal to the amax, QUIT and CONTINUE are triggered. QUIT exits the processing once it is triggered while CONTINUE will keep processing the posting lists but will not add any more accumulators to the accumulators list.
(Edited: 2021-05-17)
<nowiki>Question 2. Suppose we have m=min(5, length of your surname) many accumulators. Give examples that would trigger the QUIT and the CONTINUE accumulator pruning heuristics if term at a time processing was being used. The conditions to trigger CONTINUE and QUIT are the number of accumulators should be equal to or greater than amax. We have 8 documents 1 a dog 2 a cat 3 the dog 4 the cat 5 a good dog 6 a bad cat 7 a very good dog 8 a very bad cat We do term-at-a-time for 'a' 'a' is present in the docs 1, 2, 5, 6, 7, 8 amax is 5 (min(5, 16)) The accumulators: [(1, s1)] [(1, s1), (2, s2)] [(1, s1), (2, s2), (5, s5)] [(1, s1), (2, s2), (5, s5), (6, s6)] [(1, s1), (2, s2), (5, s5), (6, s6), (7, s7)] Since the number of accumulators is equal to the amax, QUIT and CONTINUE are triggered. QUIT exits the processing once it is triggered while CONTINUE will keep processing the posting lists but will not add any more accumulators to the accumulators list. </nowiki>
X