2020-07-01

Main Page Spiders / Image Keyword Crawl.

Hi Chris,
May I ask if there is a setting to set things up so it only spiders the main page of a domain and not all the subpages?
and also tried to search an image based on a keyword but does not return a result, but when i put "jpg" or the word "image" it shows results.
Thanks
Hi Chris, May I ask if there is a setting to set things up so it only spiders the main page of a domain and not all the subpages? and also tried to search an image based on a keyword but does not return a result, but when i put "jpg" or the word "image" it shows results. Thanks
2020-07-04

-- Main Page Spiders / Image Keyword Crawl
I thought I had already answered this a couple days ago, but I must have forgotten to hit save...
If you know all the main pages you want to crawl, you can list them under seed sites, then set the max depth for the crawl to 0. Max depth 1 would let you get the images, css, js for those main pages.
Another approach is to use a glob pattern in the allowed to crawl section. I.e., check restricts sites by url. Then use for the sites you want http://my.site.com/* . This will cause yioop to only crawl the top level pages off that site.
For image results to appear, the search term must have been in a page that linked to that image on that term, or the term appeared in the exif data for the image.
Best, Chris
(Edited: 2020-07-04)
I thought I had already answered this a couple days ago, but I must have forgotten to hit save... If you know all the main pages you want to crawl, you can list them under seed sites, then set the max depth for the crawl to 0. Max depth 1 would let you get the images, css, js for those main pages. Another approach is to use a glob pattern in the allowed to crawl section. I.e., check restricts sites by url. Then use for the sites you want http://my.site.com/* . This will cause yioop to only crawl the top level pages off that site. For image results to appear, the search term must have been in a page that linked to that image on that term, or the term appeared in the exif data for the image. Best, Chris
X