Member-only story
An Overview of Large Language Models
An introduction to large language models, including vectorization, transformers, common NLP models, and considerations for designing apps with OpenAI tokens and PEFT.
Introduction
What is a Language Model?
The probability distribution over strings of words is called the language model. [1]
For example, the knowledge that the word apple juice is more likely than apple juice concentrate is one simplest language models. This has replaced probabilistic methods nowadays with artificial neural networks. The dot product is used to measure how similar two words are in terms of character combination (syntax). But syntax can’t help us find the semantic (meaningful) relationship between two words. After all, the dot product will give us the result of zero because the character combinations are irrelevant. To solve this problem, we take vector representations (embedding) of words. Thus, we can train a model using these vector representations and obtain their context relations with each other.[1]
Word2Vec

The Word2Vec(Find Word Representation Vectors) method is a method that takes a word and establishes its relationships with the words on its right and left. Later, it determines the vector values accordingly and estimates the meanings of the words by converging on both sides of the words in the different texts that follow.[1]
Thanks to the Word2Vec method, we can perform linear algebra operations between words[1]:
king — man + woman ~= queen

Artificial neural networks are the best-performing methods today and there are equations in these neural networks that map the input to the output, and these equations have parameters. First, we get an error during model training, during parameter adjustment…