PHP Search Engine

Switching from mnogosearch .

We are running a vertical search engine on mnogosearch. In the index are the crawled webpages or shops of registered companies. So outside domains.

We are on Ubuntu 16.4. / PHP 7.4. / MariaDB 10.5.

I installed Yioop7 and made a small crawl. I do not have the 'Resume' option in the list of crawls. How can I restart/continue a crawl.

Is it possible have a picture grabbed from the website to display it in the search results? Like the product image of a webshop for example?

Thank you!

We are running a vertical search engine on mnogosearch. In the index are the crawled webpages or shops of registered companies. So outside domains. We are on Ubuntu 16.4. / PHP 7.4. / MariaDB 10.5. I installed Yioop7 and made a small crawl. I do not have the 'Resume' option in the list of crawls. How can I restart/continue a crawl. Is it possible have a picture grabbed from the website to display it in the search results? Like the product image of a webshop for example? Thank you!

-- Switching from mnogosearch

Depending on how you stopped your crawl, it might happen like you said, the resume option isn't listed. This indicates that there are no urls in the queue as saved to disk to start the crawl from. If you really need to restart that crawl, from a shell, switch into the src/executables folder, and run:

 php ArcTool.php inject timestamp file

Here timestamp is the timestamp of the crawl to add urls to the queue, file is a file with one url/line.

For your second question, I think what you want is how I do this for video sites. Basically, under Web Scrapers make a new web scraper. Give as its signature some xpath which evaluates to non-empty on pages you know have an image that could be a thumb nail. Then under extract fields you need to have a line

 THUMB_URL=xpath_for_thumb_url_on_page

You can look at the pre-built web scraper for videos that comes with Yioop for an example. It does it using open graph meta-info.

(Edited: 2021-03-06)

Depending on how you stopped your crawl, it might happen like you said, the resume option isn't listed. This indicates that there are no urls in the queue as saved to disk to start the crawl from. If you really need to restart that crawl, from a shell, switch into the src/executables folder, and run: php ArcTool.php inject timestamp file Here timestamp is the timestamp of the crawl to add urls to the queue, file is a file with one url/line. For your second question, I think what you want is how I do this for video sites. Basically, under Web Scrapers make a new web scraper. Give as its signature some xpath which evaluates to non-empty on pages you know have an image that could be a thumb nail. Then under extract fields you need to have a line THUMB_URL=xpath_for_thumb_url_on_page You can look at the pre-built web scraper for videos that comes with Yioop for an example. It does it using open graph meta-info.

-- Switching from mnogosearch

Yes, Open Graph with THUMB_URL=//meta[@property='og:image']/@content is exactly what I am looking for. But the scraper is not saved (crawl_component_scraper_missing fields) but the other samples disappeared from the list.

(Edited: 2021-03-11)

Yes, Open Graph with THUMB_URL=//meta[@property='og:image']/@content is exactly what I am looking for. But the scraper is not saved (crawl_component_scraper_missing fields) but the other samples disappeared from the list.