-- Aug 25 In-Class Exercise
In-Class Exercise Solution
answer-
Using the distance between two points I can determine the position of the point on the half-plane, and using k nearest neighbors I can determine the probability of finding a point in an area. This approach will be successful for high-density point areas but will not work for low-density outliers. Let's understand this with an example for both a correct and incorrect result situation-
Correct result situation -
Following are the training dataset input - ((7,5),1),((7,4),1),((7,3),1),((7,2),1),((7,1),1), ((0,4),0),((1,4),0),((2,4),0),((3,4),0),((4,4),0),((8,4),0),((9,4),0)
Here the (x,y) coordinates represent a position on the plane and 1 or 0 is the target result of existence of a point at that position, 1 being true and 0 being false.
As you can see in the above training dataset that when the value of x coordinate is at 7, the point appears on the plane, but in all other cases it does not. Therefore it is safe to assume that we have found a high-density K-nearest neighbor points in the region of x distance 7 on the plane and the trained model returns a value of 1 for any input data set having the x coordinate as 7. An example of a correct result would be for a new input (7,8) returning a result of 1. Now let's discuss where an incorrect result would be returned.
Incorrect result situation -
As our trained model believes x coordinate 7 to be a high-density k-nearest neighbor region, it is bound to return a result of 1 with such input. But the model is not trained for input with a negative -y coordinate inputs, as their position is below the line of plane . Therefore if an input of (7,-5) is passed an incorrect result of 1 is produced for a region that never really had any points.
Therefore above the line of the plane, the points exist at x coordinate 7 but below the line of the plane they do not. Therefore an incorrect output is received.
If we parametrically learn this dataset, the smallest model would have been affected by the frequency of outliers in the dataset. For example, a value of ((7,1000),0) is an outlier and their frequency affects the model's attitude towards a high-density area. Making the model's prediction ability weak for even easier predictions.
(
Edited: 2021-08-27)
<u>'''In-Class Exercise Solution'''</u>
answer-
Using the distance between two points I can determine the position of the point on the half-plane, and using k nearest neighbors I can determine the probability of finding a point in an area. This approach will be successful for high-density point areas but will not work for low-density outliers. Let's understand this with an example for both a correct and incorrect result situation-
'''Correct result situation -'''
Following are the training dataset input - ((7,5),1),((7,4),1),((7,3),1),((7,2),1),((7,1),1), ((0,4),0),((1,4),0),((2,4),0),((3,4),0),((4,4),0),((8,4),0),((9,4),0)
Here the (x,y) coordinates represent a position on the plane and 1 or 0 is the target result of existence of a point at that position, 1 being true and 0 being false.
As you can see in the above training dataset that when the value of '''x coordinate''' is at 7, the point appears on the plane, but in all other cases it does not. Therefore it is safe to assume that we have found a high-density K-nearest neighbor points in the region of x distance 7 on the plane and the trained model returns a value of 1 for any input data set having the x coordinate as 7. An example of a correct result would be for a new input (7,8) returning a result of 1. Now let's discuss where an incorrect result would be returned.
'''Incorrect result situation -'''
As our trained model believes x coordinate 7 to be a high-density k-nearest neighbor region, it is bound to return a result of 1 with such input. But the model is not trained for input with a negative -y coordinate inputs, as their position is '''below the line of plane'''. Therefore if an input of (7,-5) is passed an incorrect result of 1 is produced for a region that never really had any points.
Therefore above the line of the plane, the points exist at x coordinate 7 but below the line of the plane they do not. Therefore an incorrect output is received.
If we parametrically learn this dataset, the smallest model would have been affected by the frequency of outliers in the dataset. For example, a value of ((7,1000),0) is an outlier and their frequency affects the model's attitude towards a high-density area. Making the model's prediction ability weak for even easier predictions.