2017-02-23

Yioop setup question and ask for help?.

Hello all,
first of all many thanks to the developer Chris Pollett for this greate job!!!
Yesterday I have installed Yioop 4 on my Ubuntu 14.04 server running with PHP7 and a MYSQL database. The hostsytem runs with 64GB DDR4 2400 and the Core I7-7700k with 2x 8TB HDD Raid1
The first installation and configuration has works fine. Now I try to understand how works Yioop and what is the best settings to fetch data from the world wild web e.g. like the big search engine companies. I have setup --> Machine over 'Manage Machines' as described on the documentation with one) queue server and one) fetcher. Under 'Manage Crawl' I have start the job with the Yioop default settings, after a couple of minutes the job is stoping work --> Visited Urls/Hour 86 Total Urls Count --> 100.
On the fronent I can see 100 pages indexed but if I try to search for keywords... I receive not a result.
Therefore can anyone help on this, what is the best settings to fetch data from WWW
Thank you in advanced :)
Hello all, first of all many thanks to the developer Chris Pollett for this greate job!!! Yesterday I have installed Yioop 4 on my Ubuntu 14.04 server running with PHP7 and a MYSQL database. The hostsytem runs with 64GB DDR4 2400 and the Core I7-7700k with 2x 8TB HDD Raid1 The first installation and configuration has works fine. Now I try to understand how works Yioop and what is the best settings to fetch data from the world wild web e.g. like the big search engine companies. I have setup --> Machine over 'Manage Machines' as described on the documentation with one) queue server and one) fetcher. Under 'Manage Crawl' I have start the job with the Yioop default settings, after a couple of minutes the job is stoping work --> Visited Urls/Hour 86 Total Urls Count --> 100. On the fronent I can see 100 pages indexed but if I try to search for keywords... I receive not a result. Therefore can anyone help on this, what is the best settings to fetch data from WWW Thank you in advanced :)

-- Yioop setup question and ask for help?
After rerun my test. I have only 3 indexes only show the robots.txt files.
How and which links I can put in to crawl and fetch the data? Is there an missconfiguration from my side?
Resource Description for yioop.JPG
After rerun my test. I have only 3 indexes only show the robots.txt files. How and which links I can put in to crawl and fetch the data? Is there an missconfiguration from my side? ((resource:yioop.JPG|Resource Description for yioop.JPG))
2017-02-24

-- Yioop setup question and ask for help?
Hi,
Indexes aren't searchable until either (1) you have stopped the crawl and it appears under the Previous Crawls List, or (2) you have crawled at least NUM_DOCS_PER_GENERATION urls (either 10,000 or 40,000 urls depending on how much memory you). I am not sure what your crawl set up was for the second test. If under Manage Crawls, you click Options and then copy and paste your options I could suggest why it stopped crawling.
Best, Chris
Hi, Indexes aren't searchable until either (1) you have stopped the crawl and it appears under the Previous Crawls List, or (2) you have crawled at least NUM_DOCS_PER_GENERATION urls (either 10,000 or 40,000 urls depending on how much memory you). I am not sure what your crawl set up was for the second test. If under Manage Crawls, you click Options and then copy and paste your options I could suggest why it stopped crawling. Best, Chris

-- Yioop setup question and ask for help?
Hi Chris,
thank you very much for your support. On the second test I have used the yioop default settings (Please find the screenshot attached). On my third test, I have used a couple of my own domains but same issue, always shows under 'Most recent urls' the link + /robots.txt. What is the reason why I can only see robots.txt? On the php.ini I have setup 5000MB of memory, so I think this sould be enough? Resource Description for crawler_settings.JPG
(Edited: 2017-02-25)
Hi Chris, thank you very much for your support. On the second test I have used the yioop default settings (Please find the screenshot attached). On my third test, I have used a couple of my own domains but same issue, always shows under 'Most recent urls' the link + /robots.txt. What is the reason why I can only see robots.txt? On the php.ini I have setup 5000MB of memory, so I think this sould be enough? ((resource:crawler_settings.JPG|Resource Description for crawler_settings.JPG))
2017-02-25

-- Yioop setup question and ask for help?
Do you see any messages in your php error log files or in your apache error.log?
Also, maybe under manage machines, turn off all the Fetchers and the Queue Server. Then try to run the scripts from the command line. cd to YIOOP_DIR/src/executables. Run
 php QueueServer.php terminal
and in a different terminal window run
 php Fetcher.php terminal
Then start the crawl under Manage Crawls. See if you see an error messages in what is output/ check if one of the processes crashes. Sometimes the Yioop might need a function that is not available in your compilation of PHP, maybe this check would uncover this.
Best, Chris
Do you see any messages in your php error log files or in your apache error.log? Also, maybe under manage machines, turn off all the Fetchers and the Queue Server. Then try to run the scripts from the command line. cd to YIOOP_DIR/src/executables. Run php QueueServer.php terminal and in a different terminal window run php Fetcher.php terminal Then start the crawl under Manage Crawls. See if you see an error messages in what is output/ check if one of the processes crashes. Sometimes the Yioop might need a function that is not available in your compilation of PHP, maybe this check would uncover this. Best, Chris

-- Yioop setup question and ask for help?
Hi Chris,

thanks again for your help.
I have checked now the error log files, but wihtout any entry. I have also run the queue server and fetcher over the command line. But from my point of view I can not see any issue, I have mad a video with the running results and a test run. I hope this is ok for you?

https://8solutions.cloud/s/up44T0FvUpx6Nxn

Regards
(Edited: 2017-02-26)
Hi Chris, <br> <br>thanks again for your help. <br>I have checked now the error log files, but wihtout any entry. I have also run the queue server and fetcher over the command line. But from my point of view I can not see any issue, I have mad a video with the running results and a test run. I hope this is ok for you? <br> <br> [[https://8solutions.cloud/s/up44T0FvUpx6Nxn|https://8solutions.cloud/s/up44T0FvUpx6Nxn]] <br> <br>Regards

-- Yioop setup question and ask for help?
You wouldn't happen to have adjusted anything with the Config.php file (timeouts?) -- it seems to be rescheduling for download a lot of the files. Also, at 1:41 -1:42 I see an error in the Fetcher's log, but it goes by too fast for me to see what is triggering the initial error. The video only shows you running for one Schedule -> Fetch->return results cycle. If you leave it running what happens? (At most five minutes, then cut and paste some of the Fetcher and QueueServer logs)
You wouldn't happen to have adjusted anything with the Config.php file (timeouts?) -- it seems to be rescheduling for download a lot of the files. Also, at 1:41 -1:42 I see an error in the Fetcher's log, but it goes by too fast for me to see what is triggering the initial error. The video only shows you running for one Schedule -> Fetch->return results cycle. If you leave it running what happens? (At most five minutes, then cut and paste some of the Fetcher and QueueServer logs)
2017-02-28

-- Yioop setup question and ask for help?
Hi Chris, I have created a second server based on Centos7 and works fine now. Thank you very much for your support!
Hi Chris, I have created a second server based on Centos7 and works fine now. Thank you very much for your support!
X