-- Sep. 26 In-Class Exercise Thread
You might have noticed I changed the original string to segment from 新年快樂 to 你们好吗? (Happy New Year to How are you?). Yioop does segmentation for Chinese based on a Bloom Filter created from Chinese Wikipedia page titles. There is a Wikipedia page 新年快樂 which disambiguates the ABBA song Happy New Year and several movies named Happy New Year. So Yioop was treating 新年快樂 as one term, which didn't really illustrate segmentation. There was also a potential issue that I had only stored the Chinese simplified (not traditional) characters in the Bloom filter, this turned out not to be a problem as Wikipedia redirects the latter to the former and has page for each, so I'd stored both in my Bloom filter.
Best,
Chris
(
Edited: 2018-09-28)
You might have noticed I changed the original string to segment from 新年快樂 to 你们好吗? (Happy New Year to How are you?). Yioop does segmentation for Chinese based on a Bloom Filter created from Chinese Wikipedia page titles. There is a Wikipedia page 新年快樂 which disambiguates the ABBA song Happy New Year and several movies named Happy New Year. So Yioop was treating 新年快樂 as one term, which didn't really illustrate segmentation. There was also a potential issue that I had only stored the Chinese simplified (not traditional) characters in the Bloom filter, this turned out not to be a problem as Wikipedia redirects the latter to the former and has page for each, so I'd stored both in my Bloom filter.
Best,
Chris