2019-04-15

Conjuctive Query Issue.

I have an issue with my search results. Basically what I have are about 7,000 pages that all contain the word Arkansas. On about 300 of those pages is also the name Jones. If I search for Arkansas it will give me 7,000 results. If I search for Jones it will give me 300 results. However if I search for both Arkansas and Jones at the same time I will get 2 results. I am not using an exact match operator when I do this search so I would expect to get hat 300 results back from that query. Why is it only giving me 2 results? I am caching the whole page and making the description length very large.
Any Ideas?
(Edited: 2019-04-15)
I have an issue with my search results. Basically what I have are about 7,000 pages that all contain the word Arkansas. On about 300 of those pages is also the name Jones. If I search for Arkansas it will give me 7,000 results. If I search for Jones it will give me 300 results. However if I search for both Arkansas and Jones at the same time I will get 2 results. I am not using an exact match operator when I do this search so I would expect to get hat 300 results back from that query. Why is it only giving me 2 results? I am caching the whole page and making the description length very large. Any Ideas?
2019-04-19

-- Conjuctive Query Issue
First off, turning on caching of the whole page has no effect on what search results you get. The two values Max Page Summary Length in Bytes and Byte Range to Download under Page Options have the largest effect in what is indexed. At query time, Yioop tries to detect if the query or parts of the query were for phrases that were stored as single terms in the index. If it finds sufficient results of this type, it doesn't bother to do a conjunctive query for that phrase, it just looks up that whole phrase as a term. The idea was to try to make queries faster if the index was very big (hundreds of millions of downloaded pages on a spin hard drive). I agree this can yield weird results and is unnecessary for smallish crawls of 10's of millions of pages or if you have everything on SSD or RAM. I just added a flag so you can turn this off and get pure conjunctive query behavior if desired. To do this, check out the version of Yioop in the git repository and add:
 nsdefine('USE_CHECK_FOR_PHRASE_QUERIES', false);
to your src/configs/LocalConfig.php file. If you don't have such a file, just create it, for example, the file might contain:
 <?php
 namespace seekquarry\yioop\configs;
 nsdefine('USE_CHECK_FOR_PHRASE_QUERIES', false);
(Edited: 2019-04-19)
First off, turning on caching of the whole page has no effect on what search results you get. The two values Max Page Summary Length in Bytes and Byte Range to Download under Page Options have the largest effect in what is indexed. At query time, Yioop tries to detect if the query or parts of the query were for phrases that were stored as single terms in the index. If it finds sufficient results of this type, it doesn't bother to do a conjunctive query for that phrase, it just looks up that whole phrase as a term. The idea was to try to make queries faster if the index was very big (hundreds of millions of downloaded pages on a spin hard drive). I agree this can yield weird results and is unnecessary for smallish crawls of 10's of millions of pages or if you have everything on SSD or RAM. I just added a flag so you can turn this off and get pure conjunctive query behavior if desired. To do this, check out the version of Yioop in the git repository and add: nsdefine('USE_CHECK_FOR_PHRASE_QUERIES', false); to your src/configs/LocalConfig.php file. If you don't have such a file, just create it, for example, the file might contain: <?php namespace seekquarry\yioop\configs; nsdefine('USE_CHECK_FOR_PHRASE_QUERIES', false);

-- Conjuctive Query Issue
Ok thanks Chris!
Ok thanks Chris!

-- Conjuctive Query Issue
Chris, when I try to download and install from the Git repository I cannot make the software work at all. I can download version 5.0.4 from the seekquarry website and it works fine but trying to install from Git does not work for me.
I am assuming you made some changes to the PhraseParser script to achieve this purely conjuctive query? Is there any way I can change a little code in 5.0.4 to make this work?
Chris, when I try to download and install from the Git repository I cannot make the software work at all. I can download version 5.0.4 from the seekquarry website and it works fine but trying to install from Git does not work for me. I am assuming you made some changes to the PhraseParser script to achieve this purely conjuctive query? Is there any way I can change a little code in 5.0.4 to make this work?
2019-04-20

-- Conjuctive Query Issue
That's weird. I just tested a fresh clone without problem. I also updated findcan.ca no problem.
The effect of the flag is to make PhraseParser::extractTermsWholePhrase immediately return $terms. So you could modify that method to do that and see if it helps.
Best,
Chris
That's weird. I just tested a fresh clone without problem. I also updated findcan.ca no problem. The effect of the flag is to make PhraseParser::extractTermsWholePhrase immediately return $terms. So you could modify that method to do that and see if it helps. Best, Chris

-- Conjuctive Query Issue
I'm quite sure that the problem is on my end somewhere. It took me a while to get yioop too work at all so it's probably my machine set up.
I will check out that phrase parser and see if I can figure it out.
Thanks!
I'm quite sure that the problem is on my end somewhere. It took me a while to get yioop too work at all so it's probably my machine set up. I will check out that phrase parser and see if I can figure it out. Thanks!

-- Conjuctive Query Issue
The problem I had with the Git version was the Que server light wouldn't turn green and it gave me a message "no fetcher has spoken with me". I tried turning the que server on by command line and it does seem to be crawling but I still get that message. Also, it seems to be crawling but nothing in the search results
The problem I had with the Git version was the Que server light wouldn't turn green and it gave me a message "no fetcher has spoken with me". I tried turning the que server on by command line and it does seem to be crawling but I still get that message. Also, it seems to be crawling but nothing in the search results
X