Models • LLM
Large Language Models
- Large Language Models (LLMs) like GPT-3 or BERT, are deep neural networks.
- They contain many connected neurons connected by billions of weighted links.
- “Given an input text “prompt”, at essence what these systems do is compute a probability distribution over a “vocabulary”—the list of all words (or actually parts of words, or tokens) that the system knows about. The vocabulary is given to the system by the human designers. GPT-3, for example, has a vocabulary of about 50,000 tokens.” Source
How do LLMs work?
- LLMs are usually tasked with a certain problem, whether sentence completion or what have you.
- They start by taking the prompt they receive, and converting it to vectors.
- They then do layer by layer computations, which result in assigning a number or logit to each word in its vocabulary.
- Then, depending on the task assigned to the LLM, it will convert each logit into a probability distribution determining which word shall come next in the text.
Similarity Computation
- The natural next step here is to understand if two sentences are similar or different from each other.
- Sentence similarity is the measure of the degree to which two sentences are semantically equivalent in meaning.
- Below are the two most common measure of sentence similarity:
Dot Product
Cosine Similarity
\([\text{cosine_similarity}(\mathbf{u},\mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{\left|\mathbf{u}\right|\left|\mathbf{v}\right|} = \frac{\sum_{i=1}^{n} u_i v_i}{\sqrt{\sum_{i=1}^{n} u_i^2} \sqrt{\sum_{i=1}^{n} v_i^2}}]\)
- where,
- \(u\) and \(v\) are the two vectors being compared,
- \(⋅\) represents the dot product, ∥ u ∥ ∥u∥ and ∥ v ∥ ∥v∥ represent the magnitudes (or norms) of the vectors, and n n is the number of dimensions in the vectors.