[ Prev ]
2016-09-18

-- Crawler Halting
Thanks Chris... when I said "working" I meant the manage crawls option etc ... all good there.
I'm back to the original problem somehow with the crawl starting out and "going to sleep" ... just messing around with a few things right now to see if I can find what's happening.... was wondering if it's because I'm using domain:xyz.com in allowed domains etc but doesn't appear to be that.
Trying to figure out why I'm having issues and I'm assuming nobody else is :)
Thanks Chris... when I said "working" I meant the manage crawls option etc ... all good there. I'm back to the original problem somehow with the crawl starting out and "going to sleep" ... just messing around with a few things right now to see if I can find what's happening.... was wondering if it's because I'm using domain:xyz.com in allowed domains etc but doesn't appear to be that. Trying to figure out why I'm having issues and I'm assuming nobody else is :)
2016-09-20

-- Crawler Halting
Hi Chris... I've tried several different things with the environment on the box and cannot get a crawl to function. I left a crawl of three websites sitting for several hours and it only indexed the robots.txt of each one and then seemed to "go to sleep". Any suggestions? Sorry for all the questions...
Thanks, Paul
Hi Chris... I've tried several different things with the environment on the box and cannot get a crawl to function. I left a crawl of three websites sitting for several hours and it only indexed the robots.txt of each one and then seemed to "go to sleep". Any suggestions? Sorry for all the questions... Thanks, Paul
2016-09-23

-- Crawler Halting
Hey Paul,
If it's not crawling anything within 5 to 10 minutes, it's probably not working. I have been doing some code clean up and fixes to some of the web scraper code that was committed since 3.3. I haven't been been doing much crawling recently, so hadn't realized some stuff was broken. Since my fixes I have started a crawl on an Ubuntu 16.04 LTS machine and it seems to be going well. If you have time, maybe get the most recent commit and try to see if it solves your problem.
Best, chris
Hey Paul, If it's not crawling anything within 5 to 10 minutes, it's probably not working. I have been doing some code clean up and fixes to some of the web scraper code that was committed since 3.3. I haven't been been doing much crawling recently, so hadn't realized some stuff was broken. Since my fixes I have started a crawl on an Ubuntu 16.04 LTS machine and it seems to be going well. If you have time, maybe get the most recent commit and try to see if it solves your problem. Best, chris

-- Crawler Halting
Thanks Chris...
I have a rather silly question... what's the URL syntax to download latest via git clone?
Thanks
Paul
Thanks Chris... I have a rather silly question... what's the URL syntax to download latest via git clone? Thanks Paul

-- Crawler Halting
It's actually on the download page. Here it is:
git clone https://seekquarry.com/git/yioop.git
It's actually on the download page. Here it is: git clone https://seekquarry.com/git/yioop.git

-- Crawler Halting
thanks again... working now!
thanks again... working now!
2016-09-28

-- Crawler Halting
At the risk of being a complete PITA here... things are not working now with the crawler. Here's a quick update from previously ... had a test crawl running and due to a lot of changes in that dev instance I decided to build a new production ready Ubuntu 16.04LTS instance and begin (what I had hoped) is a lengthy crawl. Unfortunately I am back to the crawler starting up, indexing the robots.txt of the sites listed and then going to sleep. I even tried to crawl the default YIOOP list of sites and same result.
Chris - happy to provide access again to this instance if it helps or if you have any ideas.... appreciate your time
Paul
At the risk of being a complete PITA here... things are not working now with the crawler. Here's a quick update from previously ... had a test crawl running and due to a lot of changes in that dev instance I decided to build a new production ready Ubuntu 16.04LTS instance and begin (what I had hoped) is a lengthy crawl. Unfortunately I am back to the crawler starting up, indexing the robots.txt of the sites listed and then going to sleep. I even tried to crawl the default YIOOP list of sites and same result. Chris - happy to provide access again to this instance if it helps or if you have any ideas.... appreciate your time Paul
2016-09-30

-- Crawler Halting
Hey Paul,
I have been having a busy week at work. I can probably take a peek on Sunday, if you want to give me access.
Best, Chris
Hey Paul, I have been having a busy week at work. I can probably take a peek on Sunday, if you want to give me access. Best, Chris
2016-10-15

-- Crawler Halting
I have read through this, and I also am running on ubuntu 16.04 and it seems to be working.... But the crawl only goes through about 20 - 200 sites then Halts. Any thoughts? Chris had suggested loading 3.8.1 which I did.
(Edited: 2016-10-15)
I have read through this, and I also am running on ubuntu 16.04 and it seems to be working.... But the crawl only goes through about 20 - 200 sites then Halts. Any thoughts? Chris had suggested loading 3.8.1 which I did.
2016-10-17

-- Crawler Halting
So are you using 3.8.1? I currently have a crawl of about 10million pages done on an Ubuntu 16.04LTS system running 3.8.1 at findcan.ca. Maybe check the error logs to see if you are missing any needed PHP libraries?
So are you using 3.8.1? I currently have a crawl of about 10million pages done on an Ubuntu 16.04LTS system running 3.8.1 at findcan.ca. Maybe check the error logs to see if you are missing any needed PHP libraries?
[ Next ]
X