2019-06-29

Version 6 of Yioop Software Released.

I am pleased to announce today Version 6 of Yioop Software! It's available for download from: Seekquarry Downloads.

What's New in Version 6

  • Crawler and Search Engine
    • Trending keywords now available under More and Tools link.
    • Support for multiple simultaneous crawls by assigning machines to channels and then scheduling crawls to those channels.
    • Support for general repeating crawls. These crawls have a repeat frequency and two indexes: one for searching for crawling and Yioop automatically switches between the two every repeat period.
    • Support for crawling to some fix depth directly rather than using a regex in allowable sites to crawl.
    • Dropdown to allow admins to control how Yioop should follow robots.txt files.
    • Under Page Options can now test how pages will be processed by URI, File Upload, or Direct Input.
    • Safe search check box added to Settings and enabled by default.
    • Fixes issues with HTTP/2 crawling on Linux.
    • Improves Mirror server handling.
    • Removes Memcache support as cache option for search results
       
      
  • Indexing and Library Functionality
    • Width, Height, EXIF, and XMP meta data now indexed for images and media:image-small, media:image-medium, media:image-large meta words added.
    • Improved language and safe website detection. Now also supports mul locale tag.
    • Adds stopWordsRemover method to all supported locales' Tokenizer class.
    • New LinearAlgebra class added to make it easier to do term vector manipulations both for summarizers and in using Yioop as a Library under Composer.
    • All summarizers rewritten. Each sentence for each summarizer now gets a score before being added to summary. This score is also used in ranking search results.
    • A Test link for Search Sources added to allow easy testing if source being correctly downloaded.
    • Adds new Scrape Podcast search source to allow downloading of podcasts to wiki pages.
    • Web Scraper order of application now determined by a priority field.
    • Web Scrapers now enhanced so can now extract fields like THUMB_URL or other meta words, such as for video duration. I.e., replaces functionality that previously only poorly served by video search sources.
    • Removes video search sources from search sources.
    • Add Library class with init method to make it easier to initialize Yioop when used with Composer.
    • Under Page Options have a toggle to control whether phrase extraction rather than just term extraction always done. In most circumstances, not using phrase extraction gives faster and better indexing.
    • Remove two copies of dictionary info, one in IndexShard and one in IndexDictionary, thus, making for smaller indexes.
    • Cache pages now stored with summary in same object allowing more compression if keeping cache of whole pages
    • Removes materialized metas and largely unused thesaurus functionality.
  • Group and Wiki System
    • Adds a seen media indicator in media lists, which can be user reset.
    • Improved inter-group links.
    • If wiki url has 360 in path, checks for 360 images and adds an enter VR button to view them.
    • Media updater now has a job that allows periodic downloading of podcasts to a wiki page.
    • Time zone, Cookie name, and Session token now set under Security rather than Appearance, time before autologout now controllable by admin using dropdown.
(Edited: 2019-06-29)
I am pleased to announce today Version 6 of Yioop Software! It's available for download from: [[https://www.seekquarry.com/p/Downloads|Seekquarry Downloads]]. =What's New in Version 6= *'''Crawler and Search Engine''' ** Trending keywords now available under More and Tools link. ** Support for multiple simultaneous crawls by assigning machines to channels and then scheduling crawls to those channels. ** Support for general repeating crawls. These crawls have a repeat frequency and two indexes: one for searching for crawling and Yioop automatically switches between the two every repeat period. ** Support for crawling to some fix depth directly rather than using a regex in allowable sites to crawl. ** Dropdown to allow admins to control how Yioop should follow robots.txt files. ** Under Page Options can now test how pages will be processed by URI, File Upload, or Direct Input. ** Safe search check box added to Settings and enabled by default. ** Fixes issues with HTTP/2 crawling on Linux. ** Improves Mirror server handling. ** Removes Memcache support as cache option for search results *'''Indexing and Library Functionality''' ** Width, Height, EXIF, and XMP meta data now indexed for images and media:image-small, media:image-medium, media:image-large meta words added. ** Improved language and safe website detection. Now also supports mul locale tag. ** Adds stopWordsRemover method to all supported locales' Tokenizer class. ** New LinearAlgebra class added to make it easier to do term vector manipulations both for summarizers and in using Yioop as a Library under Composer. ** All summarizers rewritten. Each sentence for each summarizer now gets a score before being added to summary. This score is also used in ranking search results. ** A Test link for Search Sources added to allow easy testing if source being correctly downloaded. ** Adds new Scrape Podcast search source to allow downloading of podcasts to wiki pages. ** Web Scraper order of application now determined by a priority field. ** Web Scrapers now enhanced so can now extract fields like THUMB_URL or other meta words, such as for video duration. I.e., replaces functionality that previously only poorly served by video search sources. ** Removes video search sources from search sources. ** Add Library class with init method to make it easier to initialize Yioop when used with Composer. ** Under Page Options have a toggle to control whether phrase extraction rather than just term extraction always done. In most circumstances, not using phrase extraction gives faster and better indexing. ** Remove two copies of dictionary info, one in IndexShard and one in IndexDictionary, thus, making for smaller indexes. ** Cache pages now stored with summary in same object allowing more compression if keeping cache of whole pages ** Removes materialized metas and largely unused thesaurus functionality. *'''Group and Wiki System''' ** Adds a seen media indicator in media lists, which can be user reset. ** Improved inter-group links. ** If wiki url has 360 in path, checks for 360 images and adds an enter VR button to view them. ** Media updater now has a job that allows periodic downloading of podcasts to a wiki page. ** Time zone, Cookie name, and Session token now set under Security rather than Appearance, time before autologout now controllable by admin using dropdown.
X