2019-07-03

How to correct the BUG of accents and special characters in HTML that changes the search for the word .

Hi Mr. Christ and thank you for taking the time to answer these important questions.
Only you tell us that you do not understand the previous question N° 12 ; so I'll try to rephrase the question and add a couple of other new questions:
1 - Our previous question #12 was just trying to find out how to create a New YIOOP ​​Plugin if we want to develop (add or create) new features to the Search Engine ???
2 - We noticed that the search engine spelling checker is not working well. Because when we search on YIOOP.COM the sentence of the French language: "Comet va-tu ", returns us as spell checker "comte va-tu ". NORMALLY, it should return me the correct spelling of this phrase in FRENCH "Comment vas-tu " exactly as it is possible on Google ???
So, we would like to know why the YIOOP ​​Spelling Corrector Algorithm is not working well ???
3- On most PHP files, we notice the Namespaces "seekquarry\yioop ". Is it possible to change this "seekquarry\yioop " to "oursearch\engine " on each YIOOP PHP file ??? If so, how to make these changes without compromising the SEARCH ENGINE ???
4 - We want to know if the YIOOP Archive Crawl is for WIKIPEDIA alone or all Web Services that are based on MEDIAWIKI ??? Are they crawled by YIOOP in all languages or only in English (for example, do the YIOOP Crawl Archives support Wikipedia in multilanguage or only in English ) ???
5 - If we want to display PDF or Doc search when users click on PDF or DOC exactly as they already click on Web, Image, Video or News , which part of the ADMINISTRATION Space can we limit a search to a type of particular file using filetype:meta word . And to which other part of the Administration we can use filetype: meta word and add a Subsearch for PDF or Doc to allow it to present only PDF or DOC files ??? Please locate us because we are not in the Administration area.
6 - Can we configure the Web Crawl => Seed Sites option like this: domain:.google.com instead of https://google.com default ???
7 - In the Documentation you mentioned domain:.website.tld but in the Admin space, we notice that there is the dot (. ) before the website that misses precisely Web Crawl => Disallowed Sites / Sites with Quotas (domain:website.tld ). So, we would like to know which of the two spellings (domain:.website.tld AND domain:website.tld ) is correct ???
8 - When we search for the word "Barça " in YIOOP.COM and click on one of the Web Link, Image , Video or News at the top, we notice the word "Barça " that we entered in the search field is automatically changed in "Bar & ccedil ; a "; it means according to my understanding that there is a Bug in YIOOP which makes that YIOOP does not support accents and special characters in HTML because the HTML code of the character "ç " is equivalent to " & ccedil ; ". The same problem arises when we search for "bàrça " and we click on Web , Image , Video or News at the top, it returns to us "b& agrave ;r & ccedil ; a ". So how to solve this bug so that YIOOP can also support accents and characters in HTML ???
9 - You have mentioned in the New Features of Version 6 of YIOOP that you have deleted MEMCACHED. If so, you have removed Memcached in YIOOP, which Cache system now uses YIOOP ??? Is it normal for YIOOP to work without a cache system since most large projects should normally use cache systems to gain speed ???
10 - Is Yioop based on a Crawl by categories ??? In fact we hope to create a plugin that will suggest suggestions of results based on the same categories of activity. For example, when you search "amazon " on the Baidu Search Engine, you have "amazon search results" on one side and "search suggestions from companies in the same industry (shopping ) as " Amazon like ebay, JD.com , ... " displayed on the other side.
So, how can we create search suggestions based on the same area of activity (same category ) as the search keyword exactly like Google, baidu with YIOOP ???
Thank you for answering us please.
(Edited: 2019-07-03)
Hi Mr. Christ and thank you for taking the time to answer these important questions. Only you tell us that you do not understand the previous question '''N° 12'''; so I'll try to rephrase the question and add a couple of other new questions: '''<u>1 -</u>''' Our previous question '''#12''' was just trying to find out how to create a New YIOOP ​​Plugin if we want to develop (add or create) new features to the Search Engine ??? ---- '''<u>2 -</u>''' We noticed that the search engine spelling checker is not working well. Because when we search on YIOOP.COM the sentence of the French language: "'''Comet va-tu'''", returns us as spell checker "'''comte va-tu'''". NORMALLY, it should return me the correct spelling of this phrase in FRENCH "'''Comment vas-tu'''" exactly as it is possible on Google ??? '''''So, we would like to know why the YIOOP ​​Spelling Corrector Algorithm is not working well ???''''' ---- '''<u>3-</u>''' On most PHP files, we notice the Namespaces "'''seekquarry\yioop'''". Is it possible to change this "'''seekquarry\yioop'''" to "'''oursearch\engine'''" on each YIOOP PHP file ??? If so, how to make these changes without compromising the SEARCH ENGINE ??? ---- '''<u>4 - </u>''' We want to know if the YIOOP '''Archive Crawl''' is for '''WIKIPEDIA alone''' or all Web Services that are based on '''MEDIAWIKI''' ??? Are they crawled by YIOOP in all languages or only in English ('''''for example, do the YIOOP Crawl Archives support Wikipedia in multilanguage or only in English''''') ??? ---- '''<u>5 - </u>''' '''If we want to display PDF or Doc search when users click on PDF or DOC exactly as they already click on Web, Image, Video or News''', which part of the '''ADMINISTRATION Space''' can we limit a search to a type of particular file using '''filetype:meta word'''. And to which other part of the Administration we can use filetype: meta word and add a '''Subsearch''' for '''PDF''' or '''Doc''' to allow it to present '''<u>only</u>''' '''PDF''' or '''DOC''' files ??? Please locate us because we are not in the Administration area. ---- '''<u>6 - </u>''' Can we configure the '''Web Crawl''' => '''Seed Sites''' option like this: '''domain:.google.com''' instead of '''https://google.com''' default ??? ---- '''<u>7 - </u>''' In the Documentation you mentioned '''domain:.website.tld''' but in the Admin space, we notice that there is the '''dot''' ('''.''') before the website that misses precisely '''Web Crawl''' => '''Disallowed Sites / Sites with Quotas''' ('''''domain:website.tld'''''). So, we would like to know which of the two spellings ('''domain:.website.tld''' AND '''domain:website.tld''') is correct ??? ---- '''<u>8 - </u>''' When we search for the word "'''Barça'''" in YIOOP.COM and click on one of the '''Web''' Link, '''Image''', '''Video''' or '''News''' at the top, we notice the word "'''Barça'''" that we entered in the search field is automatically changed in "'''Bar & ccedil ; a'''"; it means according to my understanding that there is a '''Bug''' in YIOOP which makes that '''''<u>YIOOP does not support accents and special characters in HTML</u>''''' because the HTML code of the character "'''ç'''" is equivalent to "''' & ccedil ;'''". The same problem arises when we search for "'''bàrça'''" and we click on '''Web''', '''Image''', '''Video''' or '''News''' at the top, it returns to us "'''b& agrave ;r & ccedil ; a'''". So how to solve this bug so that YIOOP can also support accents and characters in HTML ??? ---- '''<u>9 - </u>''' You have mentioned in the New Features of Version 6 of YIOOP that you have deleted MEMCACHED. If so, you have removed Memcached in YIOOP, which Cache system now uses YIOOP ??? Is it normal for YIOOP to work without a cache system since most large projects should normally use cache systems to gain speed ??? ---- '''<u>10 - </u>''' Is Yioop based on a Crawl by categories ??? In fact we hope to create a plugin that will suggest suggestions of results based on the same categories of activity. For example, when you search "'''amazon'''" on the Baidu Search Engine, you have "amazon search results" on one side and "search suggestions from companies in the same industry ('''shopping''') as " '''Amazon''' like '''ebay, JD.com''', ... " displayed on the other side. So, '''how can we create search suggestions based on the same area of activity (''same category'') as the search keyword exactly like Google, baidu with YIOOP ???''' ---- Thank you for answering us please.
2019-07-08

-- How to correct the BUG of accents and special characters in HTML that changes the search for the word
Hi,
My name is Chris not Christ.
(1) Section (8) and (9) of the Yioop Documentation describe how one can extend Yioop add plugins etc.
(2) Yioop search suggestions at this time are still very primitive. It is doing a spelling check at the individual word level for French, not at the grammar level. Notice each of the words it output were French words, but the phrase was non-sensical and there was no verb noun agreement. Improving this is probably a fair bit of work, but I am thinking of revisiting this for the next version of Yioop.
(3) Section 5 of GNU License covers what I will agree you can change in the source code if you convey it to someone else. If you are just building a site using Yioop, you shouldn't need to change the namespace of Yioop. Doing so will make it almost impossible if you want to upgrade your site to the next version. The namespace doesn't appear in anything that is output by the web interface. If you are using Yioop's wiki system, the namespace doesn't show up. If you are extending Yioop using Composer, you can choose whatever namespace you want for your project that uses Yioop.
(4) Yioop can index the media wiki format that Wikipedia uses for its data dumps. This includes dumps for other languages besides English.
(5) I am not sure I understand what you are asking, but you can make a Crawl Mix the filetype search, then use that Crawl Mix in your Subsearch.
(6) No. You need to start crawls from fixed web pages not from whole domain. From your seeds sites, yioop will crawl out to other pages linked to. By checking, Restrict Sites By Url, you can then under Allowed to Crawl Sites, restrict yioop to only crawl a single domain, which is what you probably want.
(7) The two spellings mean slightly different things. For example, .google.com will match www.google.com, mail.google.com, but not plain google.com. On the other hand, google.com would match all of www.google.com, mail.google.com, and google.com
(8) This is fixed in version 6.0.3
(9) Yioop has its own FileCache system which seemed to be comparably fast, and since it doesn't involve an external dependency for Yioop, is what I am choosing to maintain. It is not currently turned on for the yioop.com site, but is toggleable under Server Settings.
(10) Yioop doesn't crawl based on categories. What you want is probably fakeable though. You can try to make separate Yioop crawls using the Word Plugin so that those separate crawls correspond to a category. You could then make a new locale corresponding to that category, for example, fr-vêtements, make a suggest trie for that category as describe in the localization part of the Yioop documentation. Finally, one could make a crawl mix to use that locale when search and set the crawl mix as what to use by a Subsearch for Vêtements under Search Sources to get something like this effect.
Best,
Chris
(Edited: 2019-07-08)
Hi, My name is Chris not Christ. (1) Section (8) and (9) of the [[https://www.seekquarry.com/p/Documentation|Yioop Documentation]] describe how one can extend Yioop add plugins etc. (2) Yioop search suggestions at this time are still very primitive. It is doing a spelling check at the individual word level for French, not at the grammar level. Notice each of the words it output were French words, but the phrase was non-sensical and there was no verb noun agreement. Improving this is probably a fair bit of work, but I am thinking of revisiting this for the next version of Yioop. (3) [[https://www.gnu.org/licenses/gpl-3.0.html#section5|Section 5 of GNU License]] covers what I will agree you can change in the source code if you convey it to someone else. If you are just building a site using Yioop, you shouldn't need to change the namespace of Yioop. Doing so will make it almost impossible if you want to upgrade your site to the next version. The namespace doesn't appear in anything that is output by the web interface. If you are using Yioop's wiki system, the namespace doesn't show up. If you are extending Yioop using Composer, you can choose whatever namespace you want for your project that uses Yioop. (4) Yioop can index the media wiki format that Wikipedia uses for its data dumps. This includes dumps for other languages besides English. (5) I am not sure I understand what you are asking, but you can make a Crawl Mix the filetype search, then use that Crawl Mix in your Subsearch. (6) No. You need to start crawls from fixed web pages not from whole domain. From your seeds sites, yioop will crawl out to other pages linked to. By checking, Restrict Sites By Url, you can then under Allowed to Crawl Sites, restrict yioop to only crawl a single domain, which is what you probably want. (7) The two spellings mean slightly different things. For example, .google.com will match www.google.com, mail.google.com, but not plain google.com. On the other hand, google.com would match all of www.google.com, mail.google.com, and google.com (8) This is fixed in version 6.0.3 (9) Yioop has its own FileCache system which seemed to be comparably fast, and since it doesn't involve an external dependency for Yioop, is what I am choosing to maintain. It is not currently turned on for the yioop.com site, but is toggleable under Server Settings. (10) Yioop doesn't crawl based on categories. What you want is probably fakeable though. You can try to make separate Yioop crawls using the Word Plugin so that those separate crawls correspond to a category. You could then make a new locale corresponding to that category, for example, fr-vêtements, make a suggest trie for that category as describe in the localization part of the Yioop documentation. Finally, one could make a crawl mix to use that locale when search and set the crawl mix as what to use by a Subsearch for Vêtements under Search Sources to get something like this effect. Best, Chris

-- How to correct the BUG of accents and special characters in HTML that changes the search for the word
Great Thank you Sir Chris
Great Thank you Sir Chris
X