## Large Language Models

• Large Language Models (LLMs) like GPT-3 or BERT, are deep neural networks.
• They contain many connected neurons connected by billions of weighted links.
• “Given an input text “prompt”, at essence what these systems do is compute a probability distribution over a “vocabulary”—the list of all words (or actually parts of words, or tokens) that the system knows about. The vocabulary is given to the system by the human designers. GPT-3, for example, has a vocabulary of about 50,000 tokens.” Source

## How do LLMs work?

• LLMs are usually tasked with a certain problem, whether sentence completion or what have you.
• They start by taking the prompt they receive, and converting it to vectors.
• They then do layer by layer computations, which result in assigning a number or logit to each word in its vocabulary.
• Then, depending on the task assigned to the LLM, it will convert each logit into a probability distribution determining which word shall come next in the text.

## Similarity Computation

• The natural next step here is to understand if two sentences are similar or different from each other.
• Sentence similarity is the measure of the degree to which two sentences are semantically equivalent in meaning.
• Below are the two most common measure of sentence similarity:

### Cosine Similarity

$$[\text{cosine_similarity}(\mathbf{u},\mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{\left|\mathbf{u}\right|\left|\mathbf{v}\right|} = \frac{\sum_{i=1}^{n} u_i v_i}{\sqrt{\sum_{i=1}^{n} u_i^2} \sqrt{\sum_{i=1}^{n} v_i^2}}]$$

• where,
• $$u$$ and $$v$$ are the two vectors being compared,
• $$⋅$$ represents the dot product, ∥ u ∥ ∥u∥ and ∥ v ∥ ∥v∥ represent the magnitudes (or norms) of the vectors, and n n is the number of dimensions in the vectors.