Yioop - PHP Search Engine

Hw3 questions and ideas.

Question 1.

generation of dataset.

during creation, we start with a randomly generated sequence of 20 characters 'v', then generate the stick equivalent mirror of v, creating 'w'.

so vw is a stick palindrome such that v sticks with wR. of size 40

    example using n = 5;
    eg. v = AABDC
    wR = CCDBA
    w = ABDCC
    vw = AABDCABDCC

using this definition is it true we can check stickiness by traversing the strings?

    given len(v) = len(w) = 5 
    k=0
    for (i in range(0,len(v))):
        if (v[i] stickswith w[5 - i]):
            k+=1

then we can output our sticky class based on this result above.

if this is the case, why is a neural net needed to solve this problem? we can create a simple rule based on the 6 classes and the k value.

Question 2. Neural net training.

a) labeling data.

after the generation of the gene snippets, do we apply labels to the data after determinig what class they belong to?

b) convert to numbers

given an input x, we have a string of 40 characters.

how do we convert the input string to a numerical array along with a weight vector W used for gradient descent?
what is our loss function in this case?
how do we turn these inputs into a useable feature set?

c) possible solution:

if we use some of our preprocessing shown above, we can turn the input vector into a boolean array of 1s and zeros.

    given x = vw = AABDC ABDCC
    our x = [11111 11111]
    since each element from end to end is sticky.

    example for a 12-sticky string, with length = 10
    x = AABBB CCBBB 
    x = [11000 00011]

W is a column vector of size 40.

our loss function can be simply = (sum of 1s)/2 - label (k value).

we apply our softmax to group these sums which should be equal to the k value into the 6 buckets above.

any answer to the above questions are appreciated. and let me know if my understanding is on the right track or if there are any misunderstandings.

Thanks.

(Edited: 2017-10-22)

Question 1. generation of dataset. during creation, we start with a randomly generated sequence of 20 characters 'v', then generate the stick equivalent mirror of v, creating 'w'. so vw is a stick palindrome such that v sticks with wR. of size 40 example using n = 5; eg. v = AABDC wR = CCDBA w = ABDCC vw = AABDCABDCC using this definition is it true we can check stickiness by traversing the strings? given len(v) = len(w) = 5 k=0 for (i in range(0,len(v))): if (v[i] stickswith w[5 - i]): k+=1 then we can output our sticky class based on this result above. if this is the case, why is a neural net needed to solve this problem? we can create a simple rule based on the 6 classes and the k value. ---- Question 2. Neural net training. a) labeling data. after the generation of the gene snippets, do we apply labels to the data after determinig what class they belong to? b) convert to numbers given an input x, we have a string of 40 characters. *how do we convert the input string to a numerical array along with a weight vector W used for gradient descent? *what is our loss function in this case? *how do we turn these inputs into a useable feature set? c) possible solution: *if we use some of our preprocessing shown above, we can turn the input vector into a boolean array of 1s and zeros. given x = vw = AABDC ABDCC our x = [11111 11111] since each element from end to end is sticky. example for a 12-sticky string, with length = 10 x = AABBB CCBBB x = [11000 00011] W is a column vector of size 40. our loss function can be simply = (sum of 1s)/2 - label (k value). we apply our softmax to group these sums which should be equal to the k value into the 6 buckets above. any answer to the above questions are appreciated. and let me know if my understanding is on the right track or if there are any misunderstandings. Thanks.

-- Hw3 questions and ideas

(1) You don't need a neural network to solve this problem. It can be solved by a pretty straightforward algorithm. You are investigating whether or not a simple neural net trained wiht backprogration can recognize these classes. When Rumelhart conducted his neural network backpropagation experiments with palindrome, it wasn't because people couldn't solve palindrome, it was because people didn't have good algorithms to train multi-layer networks.

(2a) yes

(2b) I was sorta leaving this up to you, however, I couldn't follow your proposal in (2c) so maybe we can talk about it at start of class. I did modify/add more detail to the Oct 16 slides 10 and 11 on cross-entropy.

(Edited: 2017-10-22)

(1) You don't need a neural network to solve this problem. It can be solved by a pretty straightforward algorithm. You are investigating whether or not a simple neural net trained wiht backprogration can recognize these classes. When Rumelhart conducted his neural network backpropagation experiments with palindrome, it wasn't because people couldn't solve palindrome, it was because people didn't have good algorithms to train multi-layer networks. (2a) yes (2b) I was sorta leaving this up to you, however, I couldn't follow your proposal in (2c) so maybe we can talk about it at start of class. I did modify/add more detail to the Oct 16 slides 10 and 11 on cross-entropy.

-- Hw3 questions and ideas

@Dr. Pollett: you didn't mention numpy in HW3. Is it ok if we use numpy and tensorflow for this assignment. I think it should be ok to do that.

(Edited: 2017-10-24)

@Dr. Pollett: you didn't mention numpy in HW3. Is it ok if we use numpy and tensorflow for this assignment. I think it should be ok to do that.

-- Hw3 questions and ideas

Following up on this, can you tell us how did you arrive qat 34-STICKY, 78-STICKY etc given the length of the string is just 40 and the len(u) = k in this case.

(Edited: 2017-11-01)

Following up on this, can you tell us how did you arrive qat 34-STICKY, 78-STICKY etc given the length of the string is just 40 and the len(u) = k in this case.