Better Programming

Home

Newsletter

About

Follow publication

Advice for programmers.

Follow publication

Member-only story

Build Your Own Plagiarism Checker With Python and Machine Learning

Train a model with your dataset to detect plagiarisms in texts

Published in

Better Programming

3 min readAug 12, 2021

Source: Pexels

Introduction

Tensorflow is a very powerful library when it comes to building neural networks with a whole range of different parameters. A neural network is made of an input layer, hidden layers and an output layer. Here’s a diagram generated with the help of https://playground.tensorflow.org to help you understand it better.

Neural Network. Source: Tensorflow Playground.

Another thing we will need is Natural Language Toolkit (NLTK) to prepare the dataset with our own texts to train the machine learning model. The machine learning model can’t just understand words, so we have to tokenize the root words from texts to train the model.

Input:

The input will be a CSV file with just one ‘Text’ column. I named itplagcheckfile.csv file. Each row of the column ‘Text’ will have different texts. These texts can be as long as you want, cleaned up to contain no commas or special symbols. Longer texts will take more epochs to give higher accuracy for the model.

Creating JSON to…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Continue in app

Or, continue in mobile web

Already have an account? Sign in

Published in Better Programming

Last published Nov 10, 2023

Advice for programmers.

Written by Samrat Dutta

Developer, designer, writer, data analyst, AI enthusiast. https://www.linkedin.com/in/samratduttaofficial [follow/hire me]

Responses (1)

Write a response

What are your thoughts?

Also publish to my profile

More from Samrat Dutta and Better Programming

How To Update Your Status During Standup Like a Senior Engineer

In

Better Programming

by

Edward Huang

How To Update Your Status During Standup Like a Senior Engineer

A status update is where you can showcase how well you manage ambiguity and is an important way to build trust with your team

Oct 20, 2022

Why I Prefer Regular Merge Commits Over Squash Commits

In

Better Programming

by

Dr. Derek Austin 🥳

Why I Prefer Regular Merge Commits Over Squash Commits

I used to think squash commits were so cool, and then I had to use them all day, every day. Here’s why you should avoid squash

Sep 30, 2022

Advice From a Software Engineer With 8 Years of Experience

In

Better Programming

by

Benoit Ruiz

Advice From a Software Engineer With 8 Years of Experience

Practical tips for those who want to advance in their careers

Mar 20, 2023

The Architecture of a Modern Startup

In

Better Programming

by

Dmitry Kruglov

The Architecture of a Modern Startup

Hype wave, pragmatic evidence vs the need to move fast

Nov 7, 2022

See all from Samrat Dutta

See all from Better Programming

Recommended from Medium

Best Prompt Techniques for Best LLM Responses

In

The Modern Scientist

by

Jules S. Damji

Best Prompt Techniques for Best LLM Responses

Better prompts is all you need for better responses

Feb 12, 2024

Building Confidence in LLM Evaluation: My Experience Testing DeepEval on an Open Dataset

In

Towards AI

by

Serj Smorodinsky

Building Confidence in LLM Evaluation: My Experience Testing DeepEval on an Open Dataset

deepeval helped me uncover what is the real source of Beyonce’s depression

Oct 26, 2024

Lists

Natural Language Processing

1976 stories1619 saves

Predictive Modeling w/ Python

20 stories1856 saves

ChatGPT

21 stories990 saves

ChatGPT prompts

51 stories2640 saves

Agentic Mesh: Building Highly Reliable Agents

In

Data Science Collective

by

Eric Broda

Agentic Mesh: Building Highly Reliable Agents

LLMs are getting overloaded. Specialized LLMs, with deterministic orchestration & an agent architecture offer a more reliable path forward.

6d ago

LLM Architectures Explained: NLP Fundamentals (Part 1)

Vipra Singh

LLM Architectures Explained: NLP Fundamentals (Part 1)

Deep Dive into the architecture & building of real-world applications leveraging NLP Models starting from RNN to the Transformers.

Aug 15, 2024

The Ultimate Guide to Crafting Better Prompts for Large Language Models (LLMs)

Tom

The Ultimate Guide to Crafting Better Prompts for Large Language Models (LLMs)

Working with a Large Language Model (LLM) like ChatGPT needs special skills. This guide will help you get clear and useful answers.

Oct 10, 2024

Prompt Decomposition

Justin Muller

Prompt Decomposition

How adding more calls to an LLM can unlock scale and increase accuracy while lowering both cost and latency.

Jun 17, 2024

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams