Better Programming

Advice for programmers.

Follow publication

You're unable to read via this Friend Link since it's expired. Learn more

Member-only story

Perform XGBoost, KNN Modeling With Dimension Reduction Technique

Amit Chauhan
Better Programming
Published in
5 min readJun 1, 2021
several charts and diagrams illustrating analysis of the MNIST data set
Image by Author

In this article, we will model a classification machine learning algorithm on the MNIST data set, which is handwritten digit images. The data set contains images from zero to nine in grey-scale format. The size of each image is 28 x 28 to a total of 784 pixels. The data set is already separated into train-and-test CSV files.

You can download the data set from Kaggle.

Brief Definitions

  • XGBoost: It is used to implement ML algorithms under the gradient boosting technique. The gradient boosting decision tree (GBDT) is an accurate and effective parallel tree boosting that can be used in classification and regression problems.
  • KNN: It is a type of supervised and unsupervised algorithm. In supervised learning, it can be used in both classification and regression for discrete and continuous labels. It is very often used where the decision boundary is irregular in classification problems. Read more about KNN in this article.
  • SVM: It is a machine learning classification as well as regression algorithm based on support vectors and hyper-line that distinguish the classes. It very useful when we have noise or outliers in the data. It is very effective when the dimension of the data set is quite well defined. You can learn more about SVM in this article.
  • PCA: It is used to reduce the dimensions or column features in the data set so that the multi-collinearity problem should not occur. LDA and t-SNE are also used for dimension reduction techniques. You can find out more about PCA in this article.

Implement XGBoost With Python

First, we need to import the libraries.

Now we will read the train-and-test CSV file with the help of the pandas read_csv function.

d0 = pd.read_csv('train.csv')

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Amit Chauhan
Amit Chauhan

Written by Amit Chauhan

Data Scientist, AI/ML/DL, Azure Cloud

No responses yet