Question 1.
generation of dataset.
during creation,
we start with a randomly generated sequence of 20 characters 'v', then generate the stick equivalent mirror of v, creating 'w'.
so vw is a stick palindrome such that v sticks with wR. of size 40
example using n = 5;
eg. v = AABDC
wR = CCDBA
w = ABDCC
vw = AABDCABDCC
using this definition is it true we can check stickiness by traversing the strings?
given len(v) = len(w) = 5
k=0
for (i in range(0,len(v))):
if (v[i] stickswith w[5 - i]):
k+=1
then we can output our sticky class
based on this result above.
if this is the case, why is a neural net needed to solve this problem? we can create a simple rule based on the 6 classes and the k value.
Question 2.
Neural net training.
a) labeling data.
after the generation of the gene snippets, do we apply labels to the data after determinig what class they belong to?
b) convert to numbers
given an input x, we have a string of 40 characters.
- how do we convert the input string to a numerical array along with a weight vector W used for gradient descent?
- what is our loss function in this case?
- how do we turn these inputs into a useable feature set?
c) possible solution:
- if we use some of our preprocessing shown above, we can turn the input vector into a boolean array of 1s and zeros.
given x = vw = AABDC ABDCC
our x = [11111 11111]
since each element from end to end is sticky.
example for a 12-sticky string, with length = 10
x = AABBB CCBBB
x = [11000 00011]
W is a column vector of size 40.
our loss function can be simply = (sum of 1s)/2 - label (k value).
we apply our softmax to group these sums which should be equal to the k value into the 6 buckets above.
any answer to the above questions are appreciated. and let me know if my understanding is on the right track or if there are any misunderstandings.
Thanks.
(
Edited: 2017-10-22)
Question 1.
generation of dataset.
during creation,
we start with a randomly generated sequence of 20 characters 'v', then generate the stick equivalent mirror of v, creating 'w'.
so vw is a stick palindrome such that v sticks with wR. of size 40
example using n = 5;
eg. v = AABDC
wR = CCDBA
w = ABDCC
vw = AABDCABDCC
using this definition is it true we can check stickiness by traversing the strings?
given len(v) = len(w) = 5
k=0
for (i in range(0,len(v))):
if (v[i] stickswith w[5 - i]):
k+=1
then we can output our sticky class
based on this result above.
if this is the case, why is a neural net needed to solve this problem? we can create a simple rule based on the 6 classes and the k value.
----
Question 2.
Neural net training.
a) labeling data.
after the generation of the gene snippets, do we apply labels to the data after determinig what class they belong to?
b) convert to numbers
given an input x, we have a string of 40 characters.
*how do we convert the input string to a numerical array along with a weight vector W used for gradient descent?
*what is our loss function in this case?
*how do we turn these inputs into a useable feature set?
c) possible solution:
*if we use some of our preprocessing shown above, we can turn the input vector into a boolean array of 1s and zeros.
given x = vw = AABDC ABDCC
our x = [11111 11111]
since each element from end to end is sticky.
example for a 12-sticky string, with length = 10
x = AABBB CCBBB
x = [11000 00011]
W is a column vector of size 40.
our loss function can be simply = (sum of 1s)/2 - label (k value).
we apply our softmax to group these sums which should be equal to the k value into the 6 buckets above.
any answer to the above questions are appreciated. and let me know if my understanding is on the right track or if there are any misunderstandings.
Thanks.