Yioop can make use of a proxy server to do web crawling. This fieldset is used to configure the proxies to be used.

The Tor Proxy field is used to specify the onion router proxy used to crawl TOR web pages (Dark Web). The default is <code>127.0.0.1:9150</code>. This corresponds to the proxy on your machine which would be active by default if you have the Tor Browser running. If you instead install a tor relay service (on a MacOS you could do <code>brew install tor</code>), then start this service in the default way, the proxy would be <code>127.0.0.1:9050</code>. If you do not intend to crawl tor pages you can safely ignore this field.

Except for onion urls, Yioop does not make use of the Tor Proxy for crawling. You can configure Yioop to make use of a proxy for crawling general web pages by checking the Crawl via Proxies checkbox. This reveals a textarea were you can enter one proxy/line. The format for a line is either:
  • <code>address:port</code>
  • <code>address:port:type</code>
  • or <code>address:port:type:username:password</code>


As an example, one might have a line like:
<code>
 45.192.173.164:1080:socks5_hostname
</code>
Other possibilities for the proxy type are: <code>http</code> (default), socks4, socks4a, socks5, or the cURL flag number for the desired protocol. For example, the number 5 corresponds to socks5, 7 to socks5_hostname.
X