-- Perceptron can or cannot PAC learn linear threshold functions?
Hi Vasudha,
Let's fix a particular instance of `T_k^n(x_1,...,x_n)=|~sum_i x_i ge k~|`, say
`T_2^5(x_1, x_2,x_3,x_4,x_5)`. This is a well defined function that is computed by a perceptron. To say that the perceptron rule could PAC learn this function would mean that I could start with the perceptron that has all zero weights. Present it examples that were chosen uniformly at random from `{0,1}^5`, for example,
(0,1,1,1,0)
(1,1,0,1,1)
(0,0,0,1,0)
....
and use the perceptron rule only and after some fixed polynomial in `n`, `1/epsilon`, and `1/delta` (for this instance, `n=5`) steps it would halt with weights that `epsilon` approximate `T_2^5` at least `1-delta` of the time. Now this might actually work for some threshold functions such as `T_2^5`, however, Hastad, showed that there exist some linear threshold functions of high weight complexity (`2^{Omega(n logn)}`). This means given the speed of convergence of the perceptron rule that for such a function the number of steps before the perceptron rule would get to the correct weights is super-polynomial (grows faster than any polynomial). Since PAC-learnable require that we learn the function in time bounded by a polynomial, Hastad's function is not PAC-learnable.
Hope this is a little clearer.
Best,
Chris
(
Edited: 2017-09-24)
Hi Vasudha,
Let's fix a particular instance of @BT@T_k^n(x_1,...,x_n)=|~sum_i x_i ge k~|@BT@, say
@BT@T_2^5(x_1, x_2,x_3,x_4,x_5)@BT@. This is a well defined function that is computed by a perceptron. To say that the perceptron rule could PAC learn this function would mean that I could start with the perceptron that has all zero weights. Present it examples that were chosen uniformly at random from @BT@{0,1}^5@BT@, for example,
(0,1,1,1,0)
(1,1,0,1,1)
(0,0,0,1,0)
....
and use the perceptron rule only and after some fixed polynomial in @BT@n@BT@, @BT@1/epsilon@BT@, and @BT@1/delta@BT@ (for this instance, @BT@n=5@BT@) steps it would halt with weights that @BT@epsilon@BT@ approximate @BT@T_2^5@BT@ at least @BT@1-delta@BT@ of the time. Now this might actually work for some threshold functions such as @BT@T_2^5@BT@, however, Hastad, showed that there exist some linear threshold functions of high weight complexity (@BT@2^{Omega(n logn)}@BT@). This means given the speed of convergence of the perceptron rule that for such a function the number of steps before the perceptron rule would get to the correct weights is super-polynomial (grows faster than any polynomial). Since PAC-learnable require that we learn the function in time bounded by a polynomial, Hastad's function is not PAC-learnable.
Hope this is a little clearer.
Best,
Chris