2017-09-22

Perceptron can or cannot PAC learn linear threshold functions?.

Hi,
In slide number 6 of Sept 11th class it is mentioned that, "The Perceptron rule cannot PAC-Learn linear threshold functions under uniform boolean distribution "
However, Slide 10 of Sept 20th class mentions this:
So let's try to get a handle on the kinds of things computable by simple networks of perceptrons. To start let k>0 be an integer, define `T_k^n(x_1,...,x_n)=|~sum_i x_i ge k~|` (at least k of the inputs must be on) and define `bar{T}_k^n(x_1,...,x_n)=|~sum_i -x_i ge -k~|` (at most k inputs 1).
This is an actual linear threshold function with all weights as 1 and input values are boolean . It was proved in the class that the simple perceptron can indeed calculate this function annd we subsequently saw that the perceptron can solve majority functions which can be used for electoral college voting system.
Please help me understand the reason for the contradictions.
(Edited: 2017-09-24)
Hi, In slide number 6 of Sept 11th class it is mentioned that, "'''The Perceptron rule <u>cannot</u> PAC-Learn linear threshold functions under uniform boolean distribution'''" However, Slide 10 of Sept 20th class mentions this: So let's try to get a handle on the <u>kinds of things computable</u> by simple networks of perceptrons. To start let k>0 be an integer, define @BT@T_k^n(x_1,...,x_n)=|~sum_i x_i ge k~|@BT@ (at least k of the inputs must be on) and define @BT@bar{T}_k^n(x_1,...,x_n)=|~sum_i -x_i ge -k~|@BT@ (at most k inputs 1). This is an actual '''linear threshold function with all weights as 1 and input values are boolean'''. It was proved in the class that the simple perceptron can indeed calculate this function annd we subsequently saw that the perceptron can solve majority functions which can be used for electoral college voting system. Please help me understand the reason for the contradictions.

-- Perceptron can or cannot PAC learn linear threshold functions?
Hi Vasudha,
There is a difference between being able to compute a function and being able to learn a function. To compute a function, I have to come up with the weights for the perceptron so that it outputs the same thing as the function I want. To learn a function, I don't have complete access to the function, I only have a black box that tells me what it computes on a given set of inputs. From a certain number of input output pairs, I need to be able figure out the function. Notice for an n-bit, function we usual only allow some polynomial `n^k` of examples, so that the training finishes in our lifetime. A given n-bit boolean function is completely specified by its truth table, so `2^n` examples will always fix the function (whether or not the preceptron can compute it). On the other hand, for `n > 120`, processing `2^n` examples might take more processing power than the universe has to go through.
Best,
Chris
(Edited: 2017-09-22)
Hi Vasudha, There is a difference between being able to compute a function and being able to learn a function. To compute a function, I have to come up with the weights for the perceptron so that it outputs the same thing as the function I want. To learn a function, I don't have complete access to the function, I only have a black box that tells me what it computes on a given set of inputs. From a certain number of input output pairs, I need to be able figure out the function. Notice for an n-bit, function we usual only allow some polynomial @BT@n^k@BT@ of examples, so that the training finishes in our lifetime. A given n-bit boolean function is completely specified by its truth table, so @BT@2^n@BT@ examples will always fix the function (whether or not the preceptron can compute it). On the other hand, for @BT@n > 120@BT@, processing @BT@2^n@BT@ examples might take more processing power than the universe has to go through. Best, Chris

-- Perceptron can or cannot PAC learn linear threshold functions?
Hi,
Thanks for the explanation Professor. Please let me know if my understanding of the following real world scenario is right.
We have a country with 120 states and each state elects an elector through popular vote. For a candidate C to win, they must satisfy the majority function
MAJn(x1,...,xn)=Tn⌊n2⌋+1(x1,...,xn) where n = 120 in this case and at least 61 electors must have voted for candidate C (bits are set for these electors) for the candidate to be considered the winner.
The perceptron that we design should be able to compute the majority function (by setting all weights to 1) BUT it can never be used in the real world for this scenario because it cannot PAC-learn in a reasonable time. However, if the number of electors were to be say 50 (like in the U.S) , we can effectively design a perceptron with 2 layers ?
Hi, Thanks for the explanation Professor. Please let me know if my understanding of the following real world scenario is right. We have a country with 120 states and each state elects an elector through popular vote. For a candidate C to win, they must satisfy the majority function MAJn(x1,...,xn)=Tn⌊n2⌋+1(x1,...,xn) where n = 120 in this case and at least 61 electors must have voted for candidate C (bits are set for these electors) for the candidate to be considered the winner. The perceptron that we design should be able to compute the majority function (by setting all weights to 1) '''BUT''' it can never be used in the real world for this scenario because it cannot PAC-learn in a reasonable time. However, if the number of electors were to be say 50 (like in the U.S) , we can effectively design a perceptron with 2 layers ?

-- Perceptron can or cannot PAC learn linear threshold functions?
I am not basing this on the real U.S electors scenario where every state gets varied number of electors. The scenario I mentioned is a little simplified where each state gets only one elector.
I am not basing this on the real U.S electors scenario where every state gets varied number of electors. The scenario I mentioned is a little simplified where each state gets only one elector.
2017-09-24

-- Perceptron can or cannot PAC learn linear threshold functions?
Hi Vasudha,
Let's fix a particular instance of `T_k^n(x_1,...,x_n)=|~sum_i x_i ge k~|`, say `T_2^5(x_1, x_2,x_3,x_4,x_5)`. This is a well defined function that is computed by a perceptron. To say that the perceptron rule could PAC learn this function would mean that I could start with the perceptron that has all zero weights. Present it examples that were chosen uniformly at random from `{0,1}^5`, for example,
 
 (0,1,1,1,0)
 (1,1,0,1,1)
 (0,0,0,1,0)
 ....
and use the perceptron rule only and after some fixed polynomial in `n`, `1/epsilon`, and `1/delta` (for this instance, `n=5`) steps it would halt with weights that `epsilon` approximate `T_2^5` at least `1-delta` of the time. Now this might actually work for some threshold functions such as `T_2^5`, however, Hastad, showed that there exist some linear threshold functions of high weight complexity (`2^{Omega(n logn)}`). This means given the speed of convergence of the perceptron rule that for such a function the number of steps before the perceptron rule would get to the correct weights is super-polynomial (grows faster than any polynomial). Since PAC-learnable require that we learn the function in time bounded by a polynomial, Hastad's function is not PAC-learnable.
Hope this is a little clearer.
Best,
Chris
(Edited: 2017-09-24)
Hi Vasudha, Let's fix a particular instance of @BT@T_k^n(x_1,...,x_n)=|~sum_i x_i ge k~|@BT@, say @BT@T_2^5(x_1, x_2,x_3,x_4,x_5)@BT@. This is a well defined function that is computed by a perceptron. To say that the perceptron rule could PAC learn this function would mean that I could start with the perceptron that has all zero weights. Present it examples that were chosen uniformly at random from @BT@{0,1}^5@BT@, for example, (0,1,1,1,0) (1,1,0,1,1) (0,0,0,1,0) .... and use the perceptron rule only and after some fixed polynomial in @BT@n@BT@, @BT@1/epsilon@BT@, and @BT@1/delta@BT@ (for this instance, @BT@n=5@BT@) steps it would halt with weights that @BT@epsilon@BT@ approximate @BT@T_2^5@BT@ at least @BT@1-delta@BT@ of the time. Now this might actually work for some threshold functions such as @BT@T_2^5@BT@, however, Hastad, showed that there exist some linear threshold functions of high weight complexity (@BT@2^{Omega(n logn)}@BT@). This means given the speed of convergence of the perceptron rule that for such a function the number of steps before the perceptron rule would get to the correct weights is super-polynomial (grows faster than any polynomial). Since PAC-learnable require that we learn the function in time bounded by a polynomial, Hastad's function is not PAC-learnable. Hope this is a little clearer. Best, Chris

-- Perceptron can or cannot PAC learn linear threshold functions?
It is very clear now Professor. I think my understanding of the electors scenario is almost in line with your explanation. Thanks a lot.
It is very clear now Professor. I think my understanding of the electors scenario is almost in line with your explanation. Thanks a lot.
X