k Nearest Houses

hogwarts_py.jpeg

Download data

Back to spellbook

1. Load data

1.1 Libraries

1.2 Data

Filter required variables.

Use loc

Or specify the columns

Or use the index

2. Data partitioning

2.1 Training-Validation split

2.2 Normalisation

Create the normalisation model or algorithm using the training set

Transform the whole data set.

2.3 Generate normalised training and validation sets

Merge with original set to get target variable

Split the normalised set using the indices of the original split

3. kNN

3.1 k = 3

Alternatively, use a loop to find the best k.

If it's easier to read...

Get training set predictions.

Get validation set predictions.

3.2 Training set prediction

Training set. Confusion matrix.

A confusion matrix that's easier to read

3.3 Validation set prediction

Validation set. Confusion matrix.

A confusion matrix that's easier to read

3. New padawan

Create data of lists.

Normalise.

slytherin_py.png

5. Easier set up

5.1 Recode into 2 classes

5.2 kNN

5.2.1 kNN k = 3

Get training set prediction.

Get validation set prediction.

5.2.2 kNN k = 5

Get training set prediction.

Get validation set prediction.

5.3 Confusion matrix

5.3.1 Confusion matrix k = 3

Training set.

Validation set.

5.3.1 Confusion matrix k = 5

Validation set.

Change cutoff to 0.7

5.4 ROC

5.4.1 k = 3

Set positive class

5.4.2 k = 5

Set positive class

5.4.3 Combined plot

5.5 New padawan

5.5.1 Predict using k = 3

5.5.2 Predict using k = 5

Change cutoff to 0.7

Still a Slytherin, you are :-)

slytherin_py.png