## Overview

• Add Knowledge to Language Models:
• Standard language models: task to predict next word in seq of texts and can compute the probability of a sequence
• Masked language models(BERT): instead predict a masked token in a sequence of texts using bidirectional context
• Language models not always able to predict correctly:
• Unseen facts: some facts may not have occurred in the training corpus at all
• It can’t make up facts about the world
• Rare facts: LM hasn’t seen enoch examples during training to memorize the fact
• Model sensitivity: LM may have seen the fact during training, but is sensitive to the phrasing of the prompt
• Inability to reliably recall knowledge is a key challenge facing LM’s today

## Knowledge Graphs

• Hope to replace SQL with natural question answering in terms of gaining knowledge
• LMs are pre trained over a large amounts of unstructured and unlabeled text
• Can support more flexible natural language queries
• Cons:
• Hard to interpret(why did it return that answer)
• Knowledge is encoded into the parameters of the model so its hard to understand
• Hard to trust(LM may product realistic but incorrect answers)
• Hard to modify(not easy to remove or update knowledge in the LM)
• Techniques researchers are using to add knowledge to LM:
• We need to know facts about the word are usually in terms of entities:

• Link mentions in text to entities in a knowledge base
• Tells us which entity embeddings are relevant to the text
• They’re like word embeddings but for a knowledge base
• Knowledge graph embedding methods: TransE
• Wikipedia2Vec
• How to incorporate when it’s from a different embedding space?
• Learn a fusion layer to combine context and entity information

### ERNIE: Enhanced Language Representation with Informative Entities

• Pretrained entity embeddings
• Fusion layer
• Text encoder: multi layer bidirectional Transformer encoder(BERT) over the words in a sentence
• Knowledge encoder: stacked blocks composed of:
• Two multi headed attentions (MHAs) over entity embeddings and token embeddings
• A fusion layer to combine the output of the MHAs
• Output of fusion layer new word and entity embeddings
• Knowledge pretraining task: randomly mask token entity alignments and predict corresponding entity for a token from the entities in the sequence
• Point of the fusion layer is to find the correlation between word embeddings and entity embeddings in order to be able to correctly return the answer

### KnowBert

• Key idea pretrain an integrated entity linker as an extension to BERT
• Learning entity learning may better encode knowledge
• Uses fusion layer to combine entity and context info and adds a knowledge pretraining tasks

## KGLM

• LSTMs condition the language model on a knowledge graph
• LM predicts the next word by computing
• Now predict the next word using entity information, by computing
• Builds a “local” knowledge graph as you iterate over the sequence
• Local KG: subset of the full KG with only entities relevant to the sequence
• When should LM use knowledge graph vs predict next word
• Find top scoring parent and relation in the local KG using LSTM hidden state and pretrained entity and relation embeddings
• New entity:
• (not in local KG)
• Find top scoring entity in full KG using LSTM hidden state and pretrained entity embeddings
• KGLM outperforms GPT-2
• Nearest Neighbor Language Models (kNN-LM)
• Key idea: learning similarities between text sequence is easier than predicting the next word
• Store all representations of text sequences in a nearest neighbor datastore

### Evaluating knowledge in LMs

• Language Model analysis: LAMA probe
• How much relational ( commonsense and factual) knowledge is already in off the shelf language models
• Without any additional training or fine tuning
• Limitations of the LAMA probe:
• Hard to understand why models perform well when they do
• BERT large may memorize co occurrence patterns rather than “understanding” the cloze statements

## Citation

If you found our work useful, please cite it as:

@article{Chadha2021Distilled,
title   = {Knowledge Graphs},
author  = {Jain, Vinija and Chadha, Aman},
journal = {Distilled Notes for Stanford CS224n: Natural Language Processing with Deep Learning},
year    = {2021},
note    = {\url{https://aman.ai}}
}