You're unable to read via this Friend Link since it's expired. Learn more
Member-only story
Perform XGBoost, KNN Modeling With Dimension Reduction Technique
Modeling machine learning algorithms with MNIST data set

In this article, we will model a classification machine learning algorithm on the MNIST data set, which is handwritten digit images. The data set contains images from zero to nine in grey-scale format. The size of each image is 28 x 28 to a total of 784 pixels. The data set is already separated into train-and-test CSV files.
You can download the data set from Kaggle.
Brief Definitions
- XGBoost: It is used to implement ML algorithms under the gradient boosting technique. The gradient boosting decision tree (GBDT) is an accurate and effective parallel tree boosting that can be used in classification and regression problems.
- KNN: It is a type of supervised and unsupervised algorithm. In supervised learning, it can be used in both classification and regression for discrete and continuous labels. It is very often used where the decision boundary is irregular in classification problems. Read more about KNN in this article.
- SVM: It is a machine learning classification as well as regression algorithm based on support vectors and hyper-line that distinguish the classes. It very useful when we have noise or outliers in the data. It is very effective when the dimension of the data set is quite well defined. You can learn more about SVM in this article.
- PCA: It is used to reduce the dimensions or column features in the data set so that the multi-collinearity problem should not occur. LDA and t-SNE are also used for dimension reduction techniques. You can find out more about PCA in this article.
Implement XGBoost With Python
First, we need to import the libraries.
Now we will read the train-and-test CSV file with the help of the pandas read_csv
function.
d0 = pd.read_csv('train.csv')