2022-03-19

HW3 queries.

========= Array Structure for getPages() ===========
It is an array of arrays so you can download multiple web pages at the same time. After download a subarray [CrawlConstants::URL => “some_url”] will get populated with additional fields: [CrawlConstants::URL => “some_url”,
 CrawlConstants::PAGE => “downloaded_page”,
 . . . etc
]
The list of CrawlConstants codes can be found in src/library/CrawlConstants.php . The point of using CrawlConstants::PAGE rather than “PAGE” is to catch errors caused by slightly mistyping the field name. I.e., CrawlConstants::PAGEE will cause an error if it is not defined but “PAGEE” won’t. The constants are strings to make for more efficient serialization.
- Professor Pollett
(Edited: 2022-03-21)
========= Array Structure for getPages() =========== It is an array of arrays so you can download multiple web pages at the same time. After download a subarray [CrawlConstants::URL => “some_url”] will get populated with additional fields: [CrawlConstants::URL => “some_url”, CrawlConstants::PAGE => “downloaded_page”, . . . etc ] The list of CrawlConstants codes can be found in src/library/CrawlConstants.php . The point of using CrawlConstants::PAGE rather than “PAGE” is to catch errors caused by slightly mistyping the field name. I.e., CrawlConstants::PAGEE will cause an error if it is not defined but “PAGEE” won’t. The constants are strings to make for more efficient serialization. - Professor Pollett

-- Array structure in FetchUrl::getPages()
HW3 discussion
Questions:
  • Should we write a program to calculate the cosine similarity of the query vectors and the document vectors to get the ranking of the documents?
  • To be able to calculate the MAP score how can we get the relevant set as we manually collected the URLs based on the query?
Answer by Prof. Chris Pollett:
  • yes
  • Group your queries into >1 topics. You take the mean over the topics. For the average precision, use the summary yioop generated and the presence non presence of matching keywords.
(Edited: 2022-03-20)
HW3 discussion Questions: * Should we write a program to calculate the cosine similarity of the query vectors and the document vectors to get the ranking of the documents? * To be able to calculate the MAP score how can we get the relevant set as we manually collected the URLs based on the query? Answer by Prof. Chris Pollett: * yes * Group your queries into >1 topics. You take the mean over the topics. For the average precision, use the summary yioop generated and the presence non presence of matching keywords.
X