2018-09-25

Sep. 26 In-Class Exercise Thread.

Post your solutions to the Sep 26 In-Class Exercise to this thread.
Best,
Chris
(Edited: 2018-09-25)
Post your solutions to the Sep 26 In-Class Exercise to this thread. Best, Chris
2018-09-26

-- Sep. 26 In-Class Exercise Thread
Array (
    [0] => 新年快樂
)
Array ( [0] => 新年快樂 )

-- Sep. 26 In-Class Exercise Thread
Array (
    [0] => 新年快樂
)
Array ( [0] => 新年快樂 )

-- Sep. 26 In-Class Exercise Thread
  Array
(
    [0] => 新年快樂
)
Array ( [0] => 新年快樂 )

-- Sep. 26 In-Class Exercise Thread
Array (
    [0] => 你们
    [1] => 好
    [2] => 吗
)
Array ( [0] => 你们 [1] => 好 [2] => 吗 )

-- Sep. 26 In-Class Exercise Thread
Array (
    [0] => 你们
    [1] => 好
    [2] => 吗
)
Array ( [0] => 你们 [1] => 好 [2] => 吗 )
2018-09-28

-- Sep. 26 In-Class Exercise Thread
Array (
    [0] => 你们
    [1] => 好
    [2] => 吗
)
Array ( [0] => 你们 [1] => 好 [2] => 吗 )

-- Sep. 26 In-Class Exercise Thread
You might have noticed I changed the original string to segment from 新年快樂 to 你们好吗? (Happy New Year to How are you?). Yioop does segmentation for Chinese based on a Bloom Filter created from Chinese Wikipedia page titles. There is a Wikipedia page 新年快樂 which disambiguates the ABBA song Happy New Year and several movies named Happy New Year. So Yioop was treating 新年快樂 as one term, which didn't really illustrate segmentation. There was also a potential issue that I had only stored the Chinese simplified (not traditional) characters in the Bloom filter, this turned out not to be a problem as Wikipedia redirects the latter to the former and has page for each, so I'd stored both in my Bloom filter.
Best,
Chris
(Edited: 2018-09-28)
You might have noticed I changed the original string to segment from 新年快樂 to 你们好吗? (Happy New Year to How are you?). Yioop does segmentation for Chinese based on a Bloom Filter created from Chinese Wikipedia page titles. There is a Wikipedia page 新年快樂 which disambiguates the ABBA song Happy New Year and several movies named Happy New Year. So Yioop was treating 新年快樂 as one term, which didn't really illustrate segmentation. There was also a potential issue that I had only stored the Chinese simplified (not traditional) characters in the Bloom filter, this turned out not to be a problem as Wikipedia redirects the latter to the former and has page for each, so I'd stored both in my Bloom filter. Best, Chris
2018-09-29

-- Sep. 26 In-Class Exercise Thread
Array (
    [0] => 你们
    [1] => 好
    [2] => 吗
)
Array ( [0] => 你们 [1] => 好 [2] => 吗 )

-- Sep. 26 In-Class Exercise Thread
[0] => 你们 [1] => 好 [2] => 吗
[0] => 你们 [1] => 好 [2] => 吗
[ Next ]
X