Better Programming

Advice for programmers.

Follow publication

Member-only story

Build Your Own Plagiarism Checker With Python and Machine Learning

Samrat Dutta
Better Programming
Published in
3 min readAug 12, 2021

Source: Pexels

Introduction

Tensorflow is a very powerful library when it comes to building neural networks with a whole range of different parameters. A neural network is made of an input layer, hidden layers and an output layer. Here’s a diagram generated with the help of https://playground.tensorflow.org to help you understand it better.

Neural Network. Source: Tensorflow Playground.

Another thing we will need is Natural Language Toolkit (NLTK) to prepare the dataset with our own texts to train the machine learning model. The machine learning model can’t just understand words, so we have to tokenize the root words from texts to train the model.

Input:

The input will be a CSV file with just one ‘Text’ column. I named itplagcheckfile.csv file. Each row of the column ‘Text’ will have different texts. These texts can be as long as you want, cleaned up to contain no commas or special symbols. Longer texts will take more epochs to give higher accuracy for the model.

Creating JSON to…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Samrat Dutta
Samrat Dutta

Written by Samrat Dutta

Developer, designer, writer, data analyst, AI enthusiast. https://www.linkedin.com/in/samratduttaofficial [follow/hire me]

Write a response