Perform XGBoost, KNN Modeling With Dimension Reduction Technique

Modeling machine learning algorithms with MNIST data set

Amit Chauhan

Published in

Better Programming

5 min readJun 1, 2021

several charts and diagrams illustrating analysis of the MNIST data set — Image by Author

In this article, we will model a classification machine learning algorithm on the MNIST data set, which is handwritten digit images. The data set contains images from zero to nine in grey-scale format. The size of each image is 28 x 28 to a total of 784 pixels. The data set is already separated into train-and-test CSV files.

You can download the data set from Kaggle.

Brief Definitions

XGBoost: It is used to implement ML algorithms under the gradient boosting technique. The gradient boosting decision tree (GBDT) is an accurate and effective parallel tree boosting that can be used in classification and regression problems.
KNN: It is a type of supervised and unsupervised algorithm. In supervised learning, it can be used in both classification and regression for discrete and continuous labels. It is very often used where the decision boundary is irregular in classification problems. Read more about KNN in this article.
SVM: It is a machine learning classification as well as regression algorithm based on support vectors and hyper-line that distinguish the classes. It very useful when we have noise or outliers in the data. It is very effective when the dimension of the data set is quite well defined. You can learn more about SVM in this article.
PCA: It is used to reduce the dimensions or column features in the data set so that the multi-collinearity problem should not occur. LDA and t-SNE are also used for dimension reduction techniques. You can find out more about PCA in this article.

Implement XGBoost With Python

First, we need to import the libraries.

Now we will read the train-and-test CSV file with the help of the pandas read_csv function.

d0 = pd.read_csv('train.csv')

Better Programming

Perform XGBoost, KNN Modeling With Dimension Reduction Technique

Modeling machine learning algorithms with MNIST data set

Brief Definitions

Implement XGBoost With Python

Create an account to read the full story.

Published in Better Programming

Written by Amit Chauhan

No responses yet