-- Very big crawl.ini
Yes, a billion page crawl was done. However, that crawl made use of only a few hundred seed sites. In the way I've been using Yioop for web crawls, it usually discovers new urls and crawls them as it goes. Page discovery order also plays a role in how pages are ranked in Yioop. If you want to get Yioop to crawl a lot of urls (say millions) in a fixed order, you probably could write a sequence of "At....txt" files in work_directory/schedules/ScheduleDataTIMESTAMP, where TIMESTAMP is the time stamp of the crawl you want to have the data go to. These files are typically created with data from fetchers as they discover new urls, so would be tricky to create by hand, but doable with a short script.
(
Edited: 2017-07-18)
Yes, a billion page crawl was done. However, that crawl made use of only a few hundred seed sites. In the way I've been using Yioop for web crawls, it usually discovers new urls and crawls them as it goes. Page discovery order also plays a role in how pages are ranked in Yioop. If you want to get Yioop to crawl a lot of urls (say millions) in a fixed order, you probably could write a sequence of "At....txt" files in work_directory/schedules/ScheduleDataTIMESTAMP, where TIMESTAMP is the time stamp of the crawl you want to have the data go to. These files are typically created with data from fetchers as they discover new urls, so would be tricky to create by hand, but doable with a short script.