Yioop - PHP Search Engine

Practice Final Up!.

Hey Everyone,

The Practice Final is up!

Chris

Hey Everyone, The Practice Final is up! Chris

-- Practice Final Up!

Question 4] Team members: Tejas Saoji, Prathiba Nagarajan

L = [ 5, 8, 15, 25 ]

Delta-List = [ 5, 3, 7, 10 ]

Gamma-Code = 001 01 01 1 001 11 0001 010

Question 4] Team members: Tejas Saoji, Prathiba Nagarajan L = [ 5, 8, 15, 25 ] Delta-List = [ 5, 3, 7, 10 ] Gamma-Code = 001 01 01 1 001 11 0001 010

-- Practice Final Up!

Q9. Team members: Ekta Khanna, Pragya Rana, Divya Kathiravan, Jayashree Prabunathan, Yashi Kamboj Query processing using term partitioning:

Suppose a query q contains terms t1,...tq. Then the receptionist will forward the query to node v(t1) responsible for the term t1.
After creating a set of document score accumulators from t1's posting list, v(t1) forwards the query, along with the accumulator set to the node v(t2) which continues the process and so on.
Finally, v(tq), sends the final accumulator set to the receptionist where the top m results are selected.

Enhancement

Each query at a given point in time is processed by only a single node, this scheme does not use intra-query parallelism, so will not reduce the response time.
This problem can be mitigated to some degree by the receptionist sending pre-fetch instructions to the later v(ti)'s while v(t1) does its calculations.

Q9. Team members: Ekta Khanna, Pragya Rana, Divya Kathiravan, Jayashree Prabunathan, Yashi Kamboj Query processing using term partitioning: * Suppose a query q contains terms t1,...tq. Then the receptionist will forward the query to node v(t1) responsible for the term t1. * After creating a set of document score accumulators from t1's posting list, v(t1) forwards the query, along with the accumulator set to the node v(t2) which continues the process and so on. * Finally, v(tq), sends the final accumulator set to the receptionist where the top m results are selected. Enhancement * Each query at a given point in time is processed by only a single node, this scheme does not use intra-query parallelism, so will not reduce the response time. * This problem can be mitigated to some degree by the receptionist sending pre-fetch instructions to the later v(ti)'s while v(t1) does its calculations.

-- Practice Final Up!

Team members Anupreet kaur Priyatha, Tina question 10

(Edited: 2016-05-16)

Team members Anupreet kaur Priyatha, Tina question 10 ((resource:IMG_4768.JPG|Resource Description for IMG_4768.JPG)) ((resource:IMG_4769.JPG|Resource Description for IMG_4769.JPG))

-- Practice Final Up!

 Team: Nishant Goel and Swapnil Patil

 Question 5:

 In REBUILD, a new index is built from scratch. When this process is finished, 
 the old index is deleted and replaced with the new one.
 The time for rebuild is given by,
 c.d(new) = c.( d(old) – d(delete) + d(insert))
 where:
 c = system specific constant
 d(new) = new documents
 d(old) = old documents
 d(delete) = deleted documents
 d(insert) = inserted documents

 In REMERGE, an index is built from the text found in the newly inserted documents.
 After this index has been created, it is merged with the previously existing 
 index.
 The time for remerge is given by,
 c.d(new) = c.d(insert) + c/4. (d(old)+d(insert))

 Now, consider the following example.
 Let d(old)=100, d(delete)= 70, and  d(insert)=20

 Using equation for rebuild,
 c. d(new) = c.(100-70+20) = 50.c

 Similarly for remerge,
 c. d(new) = c.20 + c/4.(100+20) = 50.c

 Thus, we can see that both rebuild and remerge have the sane performance for the 
 above mentioned example.

 Assumption: It does not take into consideration the REMERGE overhead to garbage
 collect posting that refer to the deleted documents.

(Edited: 2016-05-16)

Team: Nishant Goel and Swapnil Patil Question 5: In REBUILD, a new index is built from scratch. When this process is finished, the old index is deleted and replaced with the new one. The time for rebuild is given by, c.d(new) = c.( d(old) – d(delete) + d(insert)) where: c = system specific constant d(new) = new documents d(old) = old documents d(delete) = deleted documents d(insert) = inserted documents In REMERGE, an index is built from the text found in the newly inserted documents. After this index has been created, it is merged with the previously existing index. The time for remerge is given by, c.d(new) = c.d(insert) + c/4. (d(old)+d(insert)) Now, consider the following example. Let d(old)=100, d(delete)= 70, and d(insert)=20 Using equation for rebuild, c. d(new) = c.(100-70+20) = 50.c Similarly for remerge, c. d(new) = c.20 + c/4.(100+20) = 50.c Thus, we can see that both rebuild and remerge have the sane performance for the above mentioned example. Assumption: It does not take into consideration the REMERGE overhead to garbage collect posting that refer to the deleted documents.

-- Practice Final Up!

Problem 8. Team Members: Krithika and Kushal

Problem 8. Team Members: Krithika and Kushal ((resource:IMG_2524.JPG|Resource Description for IMG_2524.JPG))

-- Practice Final Up!

Team Members: Sonal Kabra, Shivika Sodhi, Vikash

Team Members: Sonal Kabra, Shivika Sodhi, Vikash ((resource:IMG_0183.JPG|Resource Description for IMG_0183.JPG))

-- Practice Final Up!

Team: Linye Ouyang, Gregory Enriquez

Question 3: Briefly explain the following coding schemes and what they are used for: Canonical Huffman Code, LLRUN.

Canonical Huffman Coding is a symbolwise data compression method that takes in a sequence of symbols and encodes it as a binary representation. The representation of each symbol depends on its frequency, encoding frequent symbols using fewer bits and vice versa. To generate this encoding, a Huffman Tree is generated by creating nodes for each symbol and then merging the two lowest frequency trees until one final tree is generated. For Canonical Huffman Coding, the Huffman tree generated is then sorted lexicographically at each level of the tree. This allows the preamble to contain only the length of the encodings of each symbol and not the encoding.

The LLRUN scheme is a parametric gap compression method used when the gap lists do not follow a geometric distribution. LLRUN groups gap values of similar size into buckets with intervals B_j = [2^j, 2^j+1) and each bucket is encoded to its own code word.

(Edited: 2016-05-16)

Team: Linye Ouyang, Gregory Enriquez Question 3: Briefly explain the following coding schemes and what they are used for: Canonical Huffman Code, LLRUN. Canonical Huffman Coding is a symbolwise data compression method that takes in a sequence of symbols and encodes it as a binary representation. The representation of each symbol depends on its frequency, encoding frequent symbols using fewer bits and vice versa. To generate this encoding, a Huffman Tree is generated by creating nodes for each symbol and then merging the two lowest frequency trees until one final tree is generated. For Canonical Huffman Coding, the Huffman tree generated is then sorted lexicographically at each level of the tree. This allows the preamble to contain only the length of the encodings of each symbol and not the encoding. The LLRUN scheme is a parametric gap compression method used when the gap lists do not follow a geometric distribution. LLRUN groups gap values of similar size into buckets with intervals B_j = [2^j, 2^j+1) and each bucket is encoded to its own code word.

-- Practice Final Up!

Akshay Bodhke Dhruven Vora Salil Shenoy

Akshay Bodhke Dhruven Vora Salil Shenoy ((resource:index.jpeg|Resource Description for index.jpeg))

-- Practice Final Up!

Did anybody solve Question 7????