2012-03-04

Problems With Setup .

Originally Posted By: krazor
At first I was very excited to try out Yioop, especially after having tried Sphider and Sphider Plus recently. They're good software, but the original Sphider is old and limited. Sphider Plus is has a lot of options that I like, but it's very slow. Yioop, with the api, media search capabilities and large indexing capabilities seemed like the answer when I found it.

However, I can't ever get to the point of trying it. I've tried both .801 and .822 and have problems with both. With .801 it will not save database information. Using sqlite or sqlite3 make the db, but doesn't make any tables. If I try to use Mysql it makes the tables, but won't save mysql as the setting. It simply makes the tables and then says "Problem Updating Database!" and goes back to sqlite3. If I try to use .822 mysql seems to work (it at least sabed the information and makes the tables}, sqlite doesn't. Also it refuses to save any Machine information. If I try to make a machine it simply doesn't do anything.

So, between those things, I can't even give it a try to see how it would work for me. Also, tbh, I have no idea how to run the query server and fetchers manually. I'm not a linux guy and I have no idea how to do it in Windows.

I'm using Win7 x64 with Wamp 2.1 32bit (64bit has issues).

I also wanted to let you know on your search page I was testing the image search abilities with "filetype:jpg weather" and it makes it time out almost every time for image searches so you might want to see what you can do to fix that at some point.

Either way, you're doing good work. It's new software with limited testing from additional environments other than yours, but it seems as though it runs very very well for being php driven instead of java based.
'''Originally Posted By: krazor''' At first I was very excited to try out Yioop, especially after having tried Sphider and Sphider Plus recently. They're good software, but the original Sphider is old and limited. Sphider Plus is has a lot of options that I like, but it's very slow. Yioop, with the api, media search capabilities and large indexing capabilities seemed like the answer when I found it. <br><br>However, I can't ever get to the point of trying it. I've tried both .801 and .822 and have problems with both. With .801 it will not save database information. Using sqlite or sqlite3 make the db, but doesn't make any tables. If I try to use Mysql it makes the tables, but won't save mysql as the setting. It simply makes the tables and then says &quot;Problem Updating Database!&quot; and goes back to sqlite3. If I try to use .822 mysql seems to work (it at least sabed the information and makes the tables}, sqlite doesn't. Also it refuses to save any Machine information. If I try to make a machine it simply doesn't do anything.<br><br>So, between those things, I can't even give it a try to see how it would work for me. Also, tbh, I have no idea how to run the query server and fetchers manually. I'm not a linux guy and I have no idea how to do it in Windows.<br><br>I'm using Win7 x64 with Wamp 2.1 32bit (64bit has issues).<br><br>I also wanted to let you know on your search page I was testing the image search abilities with &quot;filetype:jpg weather&quot; and it makes it time out almost every time for image searches so you might want to see what you can do to fix that at some point.<br><br>Either way, you're doing good work. It's new software with limited testing from additional environments other than yours, but it seems as though it runs very very well for being php driven instead of java based.

-- Problems With Setup
I would judge 0.801 a little more stable than 0.822. The changes in 0.822 should allow Yioop! to scale to crawls of more than 100 million pages, but I am still noticing kinks as I work towards version 0.84. The current test crawl of 77 million should hit 100 million in slightly over a week. After which I will move portions of the index onto SSD drives. When I have done this previously, it has improved performance a lot. Right now, the default index on Yioop is going off of a spinning disk on one Mac Mini which is also involved in the current crawl so the performance is lacking and you are seeing time outs.

I admit I haven't tested Yioop as much under Windows as under *nix. I know a few people besides me who have gotten it to work, so if you are willing to correspond a bit, we can see if we can get it work for you. In particular, I have gotten it to work previously using WAMP. As you mentioned, it is easier to get a 32 bit version of PHP working on Windows (unfortunately, this means your crawl speed will take a performance hit). I just verified tonight that I was able to get a crawl to work using Xampp (I don't have Wamp handy in my VMware) and a fresh install of Yioop. I used the current version of Yioop from http://www.seekquarry.com/viewgit/?a=summary&p=yioop , this is getting pretty close to 0.84 and is probably a little better to work with than 0.822. To keep it simple leave the database as sqlite3 for the purposes of getting it to work. Confirm that your version of php supports sqlite3. On the wampserver site, I can see that version 2.2 of WAMP has a version 5.3.10 of PHP. I am not sure what version 2.1 has, but just check it is version at least 5.3. If you are using sqlite3 then when Yioop makes its work directory it merely copies the sqlite file data/default.db that comes with Yioop over to the work directory, so it should not have to create tables -- the tables should already be made. If you don't want to use the command prompt then you need to install pstools as it says on the instructions page. Did you install pstools in a location so that it is in your default path? If you have, make sure the error box in the configure page is checked. Go to manage machines and create a machine. If it gives an error, tell me the machine you tried to create, give me the profile.php for your yioop configuration, and give me any error messages it writes to the Yioop window.

Best,
Chris
I would judge 0.801 a little more stable than 0.822. The changes in 0.822 should allow Yioop! to scale to crawls of more than 100 million pages, but I am still noticing kinks as I work towards version 0.84. The current test crawl of 77 million should hit 100 million in slightly over a week. After which I will move portions of the index onto SSD drives. When I have done this previously, it has improved performance a lot. Right now, the default index on Yioop is going off of a spinning disk on one Mac Mini which is also involved in the current crawl so the performance is lacking and you are seeing time outs.<br><br>I admit I haven't tested Yioop as much under Windows as under *nix. I know a few people besides me who have gotten it to work, so if you are willing to correspond a bit, we can see if we can get it work for you. In particular, I have gotten it to work previously using WAMP. As you mentioned, it is easier to get a 32 bit version of PHP working on Windows (unfortunately, this means your crawl speed will take a performance hit). I just verified tonight that I was able to get a crawl to work using Xampp (I don't have Wamp handy in my VMware) and a fresh install of Yioop. I used the current version of Yioop from http://www.seekquarry.com/viewgit/?a=summary&p=yioop , this is getting pretty close to 0.84 and is probably a little better to work with than 0.822. To keep it simple leave the database as sqlite3 for the purposes of getting it to work. Confirm that your version of php supports sqlite3. On the wampserver site, I can see that version 2.2 of WAMP has a version 5.3.10 of PHP. I am not sure what version 2.1 has, but just check it is version at least 5.3. If you are using sqlite3 then when Yioop makes its work directory it merely copies the sqlite file data/default.db that comes with Yioop over to the work directory, so it should not have to create tables -- the tables should already be made. If you don't want to use the command prompt then you need to install pstools as it says on the instructions page. Did you install pstools in a location so that it is in your default path? If you have, make sure the error box in the configure page is checked. Go to manage machines and create a machine. If it gives an error, tell me the machine you tried to create, give me the profile.php for your yioop configuration, and give me any error messages it writes to the Yioop window.<br><br>Best,<br>Chris

-- Problems With Setup
Originally Posted By: krazor
Considering the index is on a Mac Mini, the performance is exceptional, I just wanted to let you know since feedback seemed a little slow on the forums and it's hard to figure out how something is going to run with widespread use without getting any bug reports. I've actually been watching for php crawlers for about 6 months now and just barely found your page. I have access to about 6 pages that I work on, I'll add one way back links to your site to try to help out with your pagerank.

Currently I'm backing up my files and db (mysql, I have zero experience with sqlite) and I'm going to switch to the newest version of wamp x64 and see if I can get it running without issues. I noticed that Windows firewall has a severe impact on performance with Wamp. Generally I use my private server for Joomla development and Joomla based websites for my work, so the server configuration is based upon that. With this install I'll go with a more generalized configuration to support more software. Also Wamp 2.1 used php 5.3.5 which tended to create problems in some cases, so hopefully with 2.2 using 5.3.1 it will fix some of them.
The reason I said the sqlite use didn't create any tables was because once it was setup, I used sqlbuddy selected sqlite and entered the db name I used in configuration and it didn't show any tables had been created. So if it was connecting to an empty copy for example, I could be wrong. With Pstools it's just an extraction from an archive as opposed to an install and then I added a path in the environmental variables. Didn't seem to do anything, so possibly I did something wrong there also. Would doing a lamp install in VMware be a better option, or lower performance to the point that is isn't worthwhile?

Anyways, let me get things going again with the new Wamp install and see if I can get things working. Either way I'll post back here later on.
'''Originally Posted By: krazor''' Considering the index is on a Mac Mini, the performance is exceptional, I just wanted to let you know since feedback seemed a little slow on the forums and it's hard to figure out how something is going to run with widespread use without getting any bug reports. I've actually been watching for php crawlers for about 6 months now and just barely found your page. I have access to about 6 pages that I work on, I'll add one way back links to your site to try to help out with your pagerank.<br><br>Currently I'm backing up my files and db (mysql, I have zero experience with sqlite) and I'm going to switch to the newest version of wamp x64 and see if I can get it running without issues. I noticed that Windows firewall has a severe impact on performance with Wamp. Generally I use my private server for Joomla development and Joomla based websites for my work, so the server configuration is based upon that. With this install I'll go with a more generalized configuration to support more software. Also Wamp 2.1 used php 5.3.5 which tended to create problems in some cases, so hopefully with 2.2 using 5.3.1 it will fix some of them.<br>The reason I said the sqlite use didn't create any tables was because once it was setup, I used sqlbuddy selected sqlite and entered the db name I used in configuration and it didn't show any tables had been created. So if it was connecting to an empty copy for example, I could be wrong. With Pstools it's just an extraction from an archive as opposed to an install and then I added a path in the environmental variables. Didn't seem to do anything, so possibly I did something wrong there also. Would doing a lamp install in VMware be a better option, or lower performance to the point that is isn't worthwhile?<br><br>Anyways, let me get things going again with the new Wamp install and see if I can get things working. Either way I'll post back here later on.
2012-03-05

-- Problems With Setup
Scattered thoughts ...

Another way to create the database is to manually run the script configs/createdb.php .

php 5.3.5 is fine for running Yioop.

In pstools, you need psexec.exe . If your php.exe is in your path then just put psexec.exe
in the same folder. I noticed tonight that if psexec is put in C:\windows\System32
which is also in my path, it was having some issues finding the command within the
exec() call. Also, if you change your path, make sure to restart the web server so the
path is changed for it too.

In Windows, if you click the Start button, and type cmd into the search field, you will
see cmd.exe. Click on that to get a command prompt. Use the command
cd path_to_yioop\bin
with path_to_yioop the actual path to Yioop on your machines. If php is in your path,
you should be able to type:
php fetcher.php start 0
For me, this outputs:
Starting 0-fetcher...

PsExec v1.98 - Execute processes remotely
Copyright (C) 2001-2010 Mark Russinovich
Sysinternals - http://www.sysinternals.com

php started with process ID 2008.

If you see something like this, then you know psexec is working. Type:
php fetcher stop 0
Set-up a machine under Manage Machines and you should be able to crawl.

If lamp on VMware is as slow as Windows on a Mac in VMware, I would stick with Windows. Only exception might be if getting a 64 bit version of php speeds things up.
Scattered thoughts ...<br><br>Another way to create the database is to manually run the script configs/createdb.php .<br><br>php 5.3.5 is fine for running Yioop.<br><br>In pstools, you need psexec.exe . If your php.exe is in your path then just put psexec.exe<br>in the same folder. I noticed tonight that if psexec is put in C:\windows\System32<br>which is also in my path, it was having some issues finding the command within the <br>exec() call. Also, if you change your path, make sure to restart the web server so the<br>path is changed for it too.<br><br>In Windows, if you click the Start button, and type cmd into the search field, you will<br>see cmd.exe. Click on that to get a command prompt. Use the command<br>cd path_to_yioop\bin<br>with path_to_yioop the actual path to Yioop on your machines. If php is in your path,<br>you should be able to type:<br>php fetcher.php start 0<br>For me, this outputs:<br>Starting 0-fetcher...<br><br>PsExec v1.98 - Execute processes remotely<br>Copyright (C) 2001-2010 Mark Russinovich<br>Sysinternals - http://www.sysinternals.com<br><br>php started with process ID 2008.<br><br>If you see something like this, then you know psexec is working. Type:<br>php fetcher stop 0<br>Set-up a machine under Manage Machines and you should be able to crawl.<br><br>If lamp on VMware is as slow as Windows on a Mac in VMware, I would stick with Windows. Only exception might be if getting a 64 bit version of php speeds things up.

-- Problems With Setup
Originally Posted By: krazor
Okay, got Wamp 2.2 x64 running. Looks like they did a lot better with the 64 bit version of wamp this time, as I haven't had a single issue with it. Also php and pstools added to the path just fine, and add machine seems to have worked.

However I'm still getting a few issues. I still can't use MySQL with this version, it just returns "Problem Updating Database!" the same as .822 did. Right now I'm trying yioop-v0.822-333-ged3f30f.
The query server seems to run fine, but the fetcher (just running one right now for testing) seems to turn itself off. It'll show on but then it will turn itself off after 30-60 seconds. I'll paste the logs from both below:
-----------------------------------------------------------------------------------------------------
Query Server Log (just doing a partial paste, as it everything else is just a loop):
[Mon, 05 Mar 2012 23:42:38 -0800] Queue server peak memory
usage so far9190624!!
[Mon, 05 Mar 2012 23:42:38 -0800] Sleeping...
[Mon, 05 Mar 2012 23:42:38 -0800] done.
[Mon, 05 Mar 2012 23:42:38 -0800] Start checking for new
URLs data memory usage423024296
[Mon, 05 Mar 2012 23:42:38 -0800] done.
[Mon, 05 Mar 2012 23:42:38 -0800] Checking for robots.txt
files to process...
[Mon, 05 Mar 2012 23:42:38 -0800] Number of Crawl-Delayed
Hosts: 0

[Mon, 05 Mar 2012 23:42:38 -0800] ... less than max age
[Mon, 05 Mar 2012 23:42:38 -0800] Checking age of robot data
in queue server
[Mon, 05 Mar 2012 23:42:38 -0800] Entering Process Crawl
Data Method

-----------------------------------------------------------------------------------------------------
Fether Log (full log):
[Mon, 05 Mar 2012 23:37:10 -0800] In Fetch Loop
Initialize logger..

[Mon, 05 Mar 2012 23:37:10 -0800]
[Mon, 05 Mar 2012 15:19:31 -0800] In Fetch Loop
Initialize logger..

[Mon, 05 Mar 2012 15:19:31 -0800]
[Mon, 05 Mar 2012 14:48:58 -0800] In Fetch Loop
Initialize logger..

[Mon, 05 Mar 2012 14:48:58 -0800]
[Mon, 05 Mar 2012 14:45:20 -0800] In Fetch Loop
Initialize logger..

[Mon, 05 Mar 2012 14:45:20 -0800]
[Mon, 05 Mar 2012 14:41:58 -0800] In Fetch Loop
Initialize logger..

[Mon, 05 Mar 2012 14:41:58 -0800]
'''Originally Posted By: krazor''' Okay, got Wamp 2.2 x64 running. Looks like they did a lot better with the 64 bit version of wamp this time, as I haven't had a single issue with it. Also php and pstools added to the path just fine, and add machine seems to have worked. <br><br>However I'm still getting a few issues. I still can't use MySQL with this version, it just returns &quot;Problem Updating Database!&quot; the same as .822 did. Right now I'm trying yioop-v0.822-333-ged3f30f.<br>The query server seems to run fine, but the fetcher (just running one right now for testing) seems to turn itself off. It'll show on but then it will turn itself off after 30-60 seconds. I'll paste the logs from both below:<br>-----------------------------------------------------------------------------------------------------<br>Query Server Log (just doing a partial paste, as it everything else is just a loop):<br>[Mon, 05 Mar 2012 23:42:38 -0800] Queue server peak memory<br>usage so far9190624!!<br>[Mon, 05 Mar 2012 23:42:38 -0800] Sleeping...<br>[Mon, 05 Mar 2012 23:42:38 -0800] done.<br>[Mon, 05 Mar 2012 23:42:38 -0800] Start checking for new<br>URLs data memory usage423024296<br>[Mon, 05 Mar 2012 23:42:38 -0800] done. <br>[Mon, 05 Mar 2012 23:42:38 -0800] Checking for robots.txt<br>files to process...<br>[Mon, 05 Mar 2012 23:42:38 -0800] Number of Crawl-Delayed<br>Hosts: 0<br><br>[Mon, 05 Mar 2012 23:42:38 -0800] ... less than max age<br>[Mon, 05 Mar 2012 23:42:38 -0800] Checking age of robot data<br>in queue server <br>[Mon, 05 Mar 2012 23:42:38 -0800] Entering Process Crawl<br>Data Method <br><br>-----------------------------------------------------------------------------------------------------<br>Fether Log (full log):<br>[Mon, 05 Mar 2012 23:37:10 -0800] In Fetch Loop<br>Initialize logger..<br><br>[Mon, 05 Mar 2012 23:37:10 -0800] <br>[Mon, 05 Mar 2012 15:19:31 -0800] In Fetch Loop<br>Initialize logger..<br><br>[Mon, 05 Mar 2012 15:19:31 -0800] <br>[Mon, 05 Mar 2012 14:48:58 -0800] In Fetch Loop<br>Initialize logger..<br><br>[Mon, 05 Mar 2012 14:48:58 -0800] <br>[Mon, 05 Mar 2012 14:45:20 -0800] In Fetch Loop<br>Initialize logger..<br><br>[Mon, 05 Mar 2012 14:45:20 -0800] <br>[Mon, 05 Mar 2012 14:41:58 -0800] In Fetch Loop<br>Initialize logger..<br><br>[Mon, 05 Mar 2012 14:41:58 -0800]

-- Problems With Setup
What URL did you give for the name server in configure? What is the URL for your single instance of Yioop? I have the behavior above for the fetcher if they don't match. I am using MySQL on my Mac but haven't tested on PC, I'll try to test that today to see if I notice any issues.
What URL did you give for the name server in configure? What is the URL for your single instance of Yioop? I have the behavior above for the fetcher if they don't match. I am using MySQL on my Mac but haven't tested on PC, I'll try to test that today to see if I notice any issues.

-- Problems With Setup
Originally Posted By: krazor
I have the name server set at the localhost url for yioop. So in this case http://localhost/yioop/ . I also tried it with http://localhost/ just in case, but with the same result.
'''Originally Posted By: krazor''' I have the name server set at the localhost url for yioop. So in this case http://localhost/yioop/ . I also tried it with http://localhost/ just in case, but with the same result.

-- Problems With Setup
Originally Posted By: krazor
Sorry, forgot to add a suggestion I had also. The ability to limit the url depth during a crawl. So for example the seed site would be www.site1.com and the crawler would only index www.site1.com and any site directly linked on that site also and then stop. Hope that makes sense, should be easy to implement I believe, if you're interested.
'''Originally Posted By: krazor''' Sorry, forgot to add a suggestion I had also. The ability to limit the url depth during a crawl. So for example the seed site would be www.site1.com and the crawler would only index www.site1.com and any site directly linked on that site also and then stop. Hope that makes sense, should be easy to implement I believe, if you're interested.

-- Problems With Setup
Keep it as http://localhost/yioop/
Can you under configure check the Error checkbox? Then can you launch cmd like we did before, go to:
the Start button, and type cmd into the search field, you will
see cmd.exe. Click on that to get a command prompt. Use the command
cd path_to_yioop\bin
This time type:
php fetcher.php terminal
If the fetcher is crashing, I am hoping we'll see an error message in the cmd window. Also, on a long shot
can you post me the file profile.php in your work directory. This has all your configuration details so I
can see how things are configured just to see if it seems okay.
Keep it as http://localhost/yioop/ <br>Can you under configure check the Error checkbox? Then can you launch cmd like we did before, go to:<br>the Start button, and type cmd into the search field, you will<br>see cmd.exe. Click on that to get a command prompt. Use the command<br>cd path_to_yioop\bin<br>This time type:<br>php fetcher.php terminal<br>If the fetcher is crashing, I am hoping we'll see an error message in the cmd window. Also, on a long shot<br>can you post me the file profile.php in your work directory. This has all your configuration details so I<br>can see how things are configured just to see if it seems okay.

-- Problems With Setup
Just out of curiosity was the name of the mysql database you tried to create default? I think there is a name collision with a keyword in mysql. When I try to name the database something else it works, but leaving it default for mysql seems to cause a problem. This is a bug which I just discovered. I will fix this for the next version.
Just out of curiosity was the name of the mysql database you tried to create default? I think there is a name collision with a keyword in mysql. When I try to name the database something else it works, but leaving it default for mysql seems to cause a problem. This is a bug which I just discovered. I will fix this for the next version.
[ Next ]
X