2019-02-21

Code for Search Operators.

Chris,
For some reason on my site, certain search operators don't quite work the way they should. Sometimes the exact match gets it right and sometimes it comes up with both terms on a page but they are not in a phrase. Also, the -term doesn't always work right. Someone was searching for Ingalls -Laura and it still came up with pages with both words. Do you know why this would be happening? Also, what files contain the code for this?
Thanks, Anthony
Chris, For some reason on my site, certain search operators don't quite work the way they should. Sometimes the exact match gets it right and sometimes it comes up with both terms on a page but they are not in a phrase. Also, the -term doesn't always work right. Someone was searching for Ingalls -Laura and it still came up with pages with both words. Do you know why this would be happening? Also, what files contain the code for this? Thanks, Anthony
2019-02-24

-- Code for Search Operators
On a query, flow of control goes through:
  src/controllers/SearchController processQuery(...)
  src/models/PhraseModel getPhrasePageResults(...)
  src/models/PhraseModel parseWordStructConjunctiveQuery(...)
The code of these methods builds up a collection of structs describing the query. Each struct has the hash of a term (we use hashes as hashes are all the same length; whereas, a word like foo is different in length than a word blahblahblah) in the query and whether the term should be or should not be in the results. Then src/models/PhraseModel getSummariesByHash is called. It builds index bundle iterators based on the word structs to actually, then iterates over it to actually return results.
The code in src/models/PhraseModel parseWordStructConjunctiveQuery(...) needs to be improved and it is on my todo list. The confusion is caused because I don't always return results based on treating each word in a query as a separate item to search on. I try to parse the query for common phrases which I have indexed as whole items. I do this because conjunctive queries of multiple terms is slower than single word look up, so I want to favor the latter as much as possible.
On a query, flow of control goes through: src/controllers/SearchController processQuery(...) src/models/PhraseModel getPhrasePageResults(...) src/models/PhraseModel parseWordStructConjunctiveQuery(...) The code of these methods builds up a collection of structs describing the query. Each struct has the hash of a term (we use hashes as hashes are all the same length; whereas, a word like foo is different in length than a word blahblahblah) in the query and whether the term should be or should not be in the results. Then src/models/PhraseModel getSummariesByHash is called. It builds index bundle iterators based on the word structs to actually, then iterates over it to actually return results. The code in src/models/PhraseModel parseWordStructConjunctiveQuery(...) needs to be improved and it is on my todo list. The confusion is caused because I don't always return results based on treating each word in a query as a separate item to search on. I try to parse the query for common phrases which I have indexed as whole items. I do this because conjunctive queries of multiple terms is slower than single word look up, so I want to favor the latter as much as possible.

-- Code for Search Operators
I understand about the need for speed but in my case I wouldn't care if a query took 5 seconds to return. I'd rather it return only exact phrases. Most of the feedback I get about the search engine is good but everyone always seems to bring this issue up to me.
I know you are a busy man so I won't bug you too much more
Anthony
I understand about the need for speed but in my case I wouldn't care if a query took 5 seconds to return. I'd rather it return only exact phrases. Most of the feedback I get about the search engine is good but everyone always seems to bring this issue up to me. I know you are a busy man so I won't bug you too much more Anthony
2019-02-28

-- Code for Search Operators
I was looking into -term operator. The code itself seemed fine, but I just realized one thing that can make this not work:
To compute -term, Yioop looks at documents not in the list of documents containing that term. When Yioop computes an index, it first computes a summary of the page to find the most important content, then adds the terms from this summary to appropriate term posting lists. A document might contain a term but it was in a location outside the summary generated for the document. In which case, the fact that the document contained the term would not be in the index. So a -term search might return that document. I think about the best you can easily hope for with the way Yioop does things is that -term means documents for which term is not very relevant (so didn't appear in doc summary).
You can make the summary bigger if you want to make -term more accurate.
Best,
Chris
(Edited: 2019-02-28)
I was looking into -term operator. The code itself seemed fine, but I just realized one thing that can make this not work: To compute -term, Yioop looks at documents not in the list of documents containing that term. When Yioop computes an index, it first computes a summary of the page to find the most important content, then adds the terms from this summary to appropriate term posting lists. A document might contain a term but it was in a location outside the summary generated for the document. In which case, the fact that the document contained the term would not be in the index. So a -term search might return that document. I think about the best you can easily hope for with the way Yioop does things is that -term means documents for which term is not very relevant (so didn't appear in doc summary). You can make the summary bigger if you want to make -term more accurate. Best, Chris

-- Code for Search Operators
Thank you again. This does help me understand what's going on
Anthony
Thank you again. This does help me understand what's going on Anthony
X