Study this example in depth. Notice it uses the same dataset as the previous example; however, the approach to building the data sets differs. It is important to see different perspectives on solving the same problem.
Nearest Neighbors is a supervised learning algorithm.
In the diagram below generated by the python code example nearest neighbors model, the background light red, yellow, and blue shapes represent the three trained predictive regions. The more darkly colored red, yellow, and blue dots represent the data used to train the model. As can be observed in the diagram, most of the colored dots are within the boundaries of the similarly colored regions. There are a small number of yellow and blue exceptions.
The mathematical representation of the function to determine the class predictive regions looks like this:
where:
aggregate minimum point distances
distance between two points in a defined multidimensional space
point within multidimensional space region
number of training data points
number of prediction class regions
Python Example
""" nearest_neighbors_classifier.py trains and tests a nearest neighbors classifier ... the plot output shows the class predictions in background colors and the input data as dots """
# Import needed libraries.
import numpy as np
import matplotlib.pyplot as plotlib
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets
# Define parameters.
n_neighbors = 3
weights = 'uniform'
step_size = .02
number_of_features = 2
number_of_classes = 3
plot_graph_margin = 1
feature_1_index = 0
feature_2_index = 1
class_1_background_color = 'violet'
class_2_background_color = 'moccasin'
class_3_background_color = 'skyblue'
class_1_data_point_color = 'deeppink'
class_2_data_point_color = 'orange'
class_3_data_point_color = 'dodgerblue'
plot_edge_color = 'k'
data_point_plot_size = 30
plot_title = "K-Nearest Neighbors Plot for 3 Classes"
# Import test data.
iris = datasets.load_iris()
# Create an array of feature data.
feature_data = iris.data[:, :number_of_features]
print("feature data:")
print(feature_data)
# Create an array of target classes.
# There are 3 different classes in the iris data).
target_classes = iris.target
print("target classes:")
print(target_classes)
# Assign minimum and maximum values for the plot.
plot_x_min = feature_data[:, feature_1_index].min() - plot_graph_margin
plot_x_max = feature_data[:, feature_1_index].max() + plot_graph_margin
plot_y_min = feature_data[:, feature_2_index].min() - plot_graph_margin
plot_y_max = feature_data[:, feature_2_index].max() + plot_graph_margin
# Get grid values for plotting classification prediction regions.
grid_x_values, grid_y_values = np.meshgrid(np.arange(
plot_x_min, plot_x_max, step_size),
np.arange(plot_y_min, plot_y_max, step_size))
# Flatten the arrays.
grid_x_flattened = grid_x_values.ravel()
grid_y_flattened = grid_y_values.ravel()
# Concatenate values to create classification prediction inputs.
prediction_input_values = np.c_[grid_x_flattened, grid_y_flattened]
# Instantiate a k-nearest neighbors model.
model = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
# Train the model.
model.fit(feature_data, target_classes)
# Predict the classification for the data points.
predicted_values = model.predict(prediction_input_values)
# Create plot color maps.
class_background_colors = ListedColormap([class_1_background_color,
class_2_background_color,
class_3_background_color])
data_point_colors = ListedColormap([class_1_data_point_color,
class_2_data_point_color,
class_3_data_point_color])
# Plot the predicted class background color shapes.
predicted_values = predicted_values.reshape(grid_x_values.shape)
plotlib.figure()
plotlib.pcolormesh(grid_x_values,
grid_y_values,
predicted_values,
cmap=class_background_colors)
# Plot the input data points.
plotlib.scatter(feature_data[:, 0],
feature_data[:, number_of_features - 1],
c=target_classes,
cmap=data_point_colors,
edgecolor=plot_edge_color,
s=data_point_plot_size)
plotlib.xlim(grid_x_values.min(), grid_x_values.max())
plotlib.ylim(grid_y_values.min(), grid_y_values.max())
plotlib.title(plot_title)
# Display the plot.
plotlib.show()
Output is below:
feature data:
[[5.1 3.5]
[4.9 3. ]
[4.7 3.2]
[4.6 3.1]
[5. 3.6]
[5.4 3.9]
[4.6 3.4]
. . .
[6.8 3.2]
[6.7 3.3]
[6.7 3. ]
[6.3 2.5]
[6.5 3. ]
[6.2 3.4]
[5.9 3. ]]
target classes:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
Source: Don Cowan, https://www.ml-science.com/nearest-neighbors
This work is licensed under a Creative Commons Attribution 4.0 License.