## Pattern: Trie/Prefix Tree

### [208/Medium] Implement Trie (Prefix Tree)

#### Problem

• A trie (pronounced as “try”) or prefix tree is a tree data structure used to efficiently store and retrieve keys in a dataset of strings. There are various applications of this data structure, such as autocomplete and spellchecker.

• Implement the Trie class:

• Trie() Initializes the trie object.
• void insert(String word) Inserts the string word into the trie.
• boolean search(String word) Returns true if the string word is in the trie (i.e., was inserted before), and false otherwise.
• boolean startsWith(String prefix) Returns true if there is a previously inserted string word that has the prefix prefix, and false otherwise.
• Example 1:

Input
["Trie", "insert", "search", "search", "startsWith", "insert", "search"]
[[], ["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]]
Output
[null, null, true, false, true, null, true]

Explanation
Trie trie = new Trie();
trie.insert("apple");
trie.search("apple");   // return True
trie.search("app");     // return False
trie.startsWith("app"); // return True
trie.insert("app");
trie.search("app");     // return True

• Constraints:
• 1 <= word.length, prefix.length <= 2000
• word and prefix consist only of lowercase English letters.
• At most 3 * 104 calls in total will be made to insert, search, and startsWith.
• See problem on LeetCode.

#### Solution: Dictionary without a TrieNode class

• There is no need to create a TrieNode class. Using a dictionary can give much better performance. Nodes are replaced by dictionary. is_word is replaced by adding a - key to the dictionary.
class Trie(object):
def __init__(self):
self.trie = {}

def insert(self, word):
t = self.trie
for c in word:
if c not in t: t[c] = {}
t = t[c]
t["-"] = True

def search(self, word):
t = self.trie
for c in word:
if c not in t: return False
t = t[c]
return "-" in t

def startsWith(self, prefix):
t = self.trie
for c in prefix:
if c not in t: return False
t = t[c]
return True

##### Complexity
• Time: $$O(n)$$
• Space: $$O(1)$$

#### Solution: Using a TrieNode class

• Using a dictionary to implement a TrieNode class:
class TrieNode:
def __init__(self):
self.childNodes = {}
self.isWordEnd = False

class Trie:
def __init__(self):
self.root = TrieNode()

def insert(self, word: str) -> None:
"""
we will start with root node and check for the first character
if there is already a node present for the character we will fetch that node, else we will create a new node
then we will iterate to next character
"""
currNode = self.root
for ch in word:
node = currNode.childNodes.get(ch, TrieNode())
currNode.childNodes[ch] = node
currNode = node

# after all the characters are traversed, mark the last node as end of word
currNode.isWordEnd = True

def search(self, word: str) -> bool:
"""
we will start from root node and check for all the characters
if we could not find a node for a character we will return False there
once we iterate through all character we will check if that is a word end
"""

currNode = self.root
for ch in word:
node = currNode.childNodes.get(ch) # or if ch not in currNode.childNodes:
if not node:                       #        return False
return False
currNode = node

return currNode.isWordEnd

def startsWith(self, prefix: str) -> bool:
"""
starts with is similar to search except here we don't have to check whether the last character is end of word
"""
currNode = self.root
for ch in prefix:
node = currNode.childNodes.get(ch)
if not node:
return False
currNode = node

return True

• Using a defaultdict to implement a TrieNode class:
from collections import defaultdict

class TrieNode(object):
def __init__(self):
"""
"""
self.nodes = defaultdict(TrieNode) # Easy to insert new node.
self.isword = False  # True for the end of the trie.

class Trie(object):
def __init__(self):
self.root = TrieNode()

def insert(self, word):
"""
Inserts a word into the trie by creating nodes with individual chars.
:type word: str
:rtype: void
"""
curr = self.root
for char in word:
curr = curr.nodes[char]
curr.isword = True

def search(self, word):
"""
Returns True if the word is in the trie.
:type word: str
:rtype: bool
"""
curr = self.root
for char in word:
if char not in curr.nodes:
return False
curr = curr.nodes[char]
return curr.isword

def startsWith(self, prefix):
"""
Returns if there is any word in the trie
that starts with the given prefix.
:type prefix: str
:rtype: bool
"""
curr = self.root
for char in prefix:
if char not in curr.nodes:
return False
curr = curr.nodes[char]
return True

##### Complexity
• Time: $$O(n)$$
• Space: $$O(1)$$

### [211/Medium] Design Add and Search Words Data Structure

#### Problem

• Design a data structure that supports adding new words and finding if a string matches any previously added string.

• Implement the WordDictionary class:

• WordDictionary() Initializes the object.
• void addWord(word) Adds word to the data structure, it can be matched later.
• bool search(word) Returns true if there is any string in the data structure that matches word or false otherwise. word may contain dots ‘.’ where dots can be matched with any letter.
• Example:

Input
Output
[null,null,null,null,false,true,true,true]

Explanation
WordDictionary wordDictionary = new WordDictionary();
wordDictionary.search("b.."); // return True

• Constraints:
• 1 <= word.length <= 25
• word in addWord consists of lowercase English letters.
• word in search consist of '.' or lowercase English letters.
• There will be at most 3 dots in word for search queries.
• At most 104 calls will be made to addWord and search.
• See problem on LeetCode.

#### Solution: RegEx

• With dots representing any letter, this problem is basically begging for a regular expression solution :)
class WordDictionary:
def __init__(self):
# use "#" as a delimiter for the beginning and end of the word
self.words = '#'

self.words += word + '#'

def search(self, word):
return bool(re.search('#' + word + '#', self.words))

• A different approach using defaultdict:
class WordDictionary:
def __init__(self):
self.words = collections.defaultdict(set)

# index by the length of the word

def search(self, word):
return any(map(re.compile(word).match,
self.words[len(word)]))


#### Solution: Trie

• Using TrieNode class implemented using a defaultdict:
class TrieNode():
def __init__(self):
self.children = collections.defaultdict(TrieNode)
self.isWord = False

class WordDictionary(object):
def __init__(self):
self.root = TrieNode()

node = self.root
for w in word:
node = node.children[w]
node.isWord = True

def search(self, word):
node = self.root
self.res = False
self.dfs(node, word)
return self.res

def dfs(self, node, word):
if not word:
if node.isWord:
self.res = True
return

if word[0] == ".":
for n in node.children.values():
self.dfs(n, word[1:])
else:
node = node.children.get(word[0])
if not node:
return
self.dfs(node, word[1:])

• Using TrieNode class implemented using a dictionary. Also, saving index in a stack instead of string slicing:
class WordNode:
def __init__(self):
self.children = {}
self.isEnd = False

class WordDictionary:
def __init__(self):
self.root = WordNode()

node = self.root
for w in word:
if w in node.children:
node = node.children[w]
else:
node.children[w] = WordNode()
node = node.children[w]
node.isEnd = True

def search(self, word):
stack = [(self.root, word)]
while stack:
node, w = stack.pop()
if not w:
if node.isEnd:
return True
elif w[0] == '.':
for n in node.children.values():
stack.append((n, w[1:]))
else:
if w[0] in node.children:
n = node.children[w[0]]
stack.append((n, w[1:]))

return False

• Same approach; rephrased:
class WordDictionary:
def __init__(self):
self.trie = {}

def addWord(self, word: str) -> None:
node = self.trie
for c in word:
node = node.setdefault(c, {})
node['$'] = True def search(self, word: str) -> bool: return self.searchNode(self.trie, word) def searchNode(self, node, word: str) -> bool: for i, c in enumerate(word): if c == '.': return any(self.searchNode(node[w], word[i+1:]) for w in node if w != '$')
if c not in node: return False
node = node[c]

return '\$' in node

• The question is expecting us to design a data structure that supports adding new words and finding if a string matches any previously added string. So, to design a data structure we only need to support three function’s for this data structure:
• One is the constructor that’s gonna initialize the object.
• One is adding the word’s, but all of them are lower case from a to z.
• One is searching a word, and the word can contains any character from a to z. But there is one additional character it contains, “.” character (the “.” character is a wild card, can match any character easily in the string).
• The brute force way to solve this problem is pretty simple, “having a list of words and then just for every search we would just see, if this search match any of the word in input list.” But it’s not an efficient way, especially in terms of the space complexity involved (storing whole strings and not exploiting the face that common prefixes across strings do not need to be stored again). An efficient way to solve this problem. And for that we require Trie data structure a.k.a. Prefix tree.
• Let’s understand Trie first. Trie is basically a tree where each node holds a character and can have up to 26 children in the word add-search context, because we have 26 lower case characters in the alphabet (from a to z). So, basically each node represents a single character. Graphically speaking:

• If we insert the word abc in our trie, a will have b as a child, which in turn will have c as a child. Graphically, this is how it looks like:

• Note that in the above example, c denotes the end of the word for the word abc. Now, if we added another word example, say ab, we don’t add a and b back again since they are already available to us. We can simply re-use these characters. So, we have two words along this path abc and ab. Basically, all words that start with a and b would live in this space, that’s what makes it efficient. Hence, a trie is also called a prefix tree.

• Now let’s take an example and build our tree.

Input

Output
[null,null,null,null,false,true,true,true]

• The first word we add is bad, i.e., b -> a -> d.
• Next, let’s add another word dad. Given that we don’t share the same prefix as the earlier word bad (bad starts with b, while dad starts with d), we have to start with a different path. So, let’s add dad as d -> a -> d.

• We have one last word before we start searching, say mad. So, we don’t have m, then let’s add it: m -> a -> d

• So far we have three word’s and all of them end with a different d. But all three of them have different prefixes that’s why they are along different path’s:

• Now let’s look at the searching path:

• The first word we gonna search for is pad. We start at the root and and begin looking for the first character in the word, i.e., p. We immediately return False as p doesn’t even exist in our input.
• Now, let’s search for another word bad. We start at the root and see there are any nodes with b. Yes, we have b. Now let’s check does b have a child a, which it does. For the last character d, let’s check if a has a child d, and surely it does! Lastly we have to say the last character d is in our Trie, which designated as the end of the word (marked red in the diagram below). Therefore, we return True for the input bad.
• Let’s search for .ad next. The dot . character matches any character. We start at the root and go through all possible paths using a DFS or backtracking approach. So, let’s say we start with going down b as the first node. Check if b has a child a, which it does. Now, check if a has a child d, which it also does. Lastly, note that the last character d in our Trie is designated as the end of the word. Therefore, we return True for this input .ad.
• Let’s try one last search, looking for b... So, we start at the root and see there are any nodes with b. Yes, we have b. Now, check if we have a character under b to match ., which we have as a. Now, we’re looking for a character for our last dot .. Yes we have d and it is the end of the word. Therefore, we return True for this input b...
• Here’s a visual demonstration for the overall process:

#### Solution: Trie using TrieNode class

class WordDictionary:

def __init__(self):

# Initialize your data structure here.
self.children = [None]*26
self.isEndOfWord = False

def addWord(self, word: str) -> None:

# Adds a word into the data structure.
curr = self
for c in word:
if curr.children[ord(c) - ord('a')] == None:
curr.children[ord(c) - ord('a')] = WordDictionary()
curr = curr.children[ord(c) - ord('a')]

curr.isEndOfWord = True;

def search(self, word: str) -> bool:

# Returns if the word is in the data structure. A word could contain the dot character '.' to represent any one letter.
curr = self
for i in range(len(word)):
c = word[i]
if c == '.':
for ch in curr.children:
if ch != None and ch.search(word[i+1:]): return True
return False

if curr.children[ord(c) - ord('a')] == None: return False
curr = curr.children[ord(c) - ord('a')]

return curr != None and curr.isEndOfWord

##### Complexity
• Time: $$O(m)$$ for well defined words, but in the worse case $$O(m \cdot 26^n)$$ where $$m$$ is the total queries and $$n$$ being the length of the input string
• To understand the worst case time complexity, consider the input ....... and the trie is almost complete till a deep enough level. In this case, chances are you have to iterate over 26 character for each node. That boils down the time complexity to be $$O(m \cdot 26^n)$$
• Space: $$O(1)$$ for well defined words, but for worst case $$O(m)$$

### [616/758/Medium] Add Bold Tag in String

#### Problem

• You are given a string s and an array of strings words. You should add a closed pair of bold tag <b> and </b> to wrap the substrings in s that exist in words. If two such substrings overlap, you should wrap them together with only one pair of closed bold-tag. If two substrings wrapped by bold tags are consecutive, you should combine them.

• Return s after adding the bold tags.

• Example 1:

Input: s = "abcxyz123", words = ["abc","123"]
Output: "<b>abc</b>xyz<b>123</b>"

• Example 2:
Input: s = "aaabbcc", words = ["aaa","aab","bc"]
Output: "<b>aaabbc</b>c"

• Constraints:
• 1 <= s.length <= 1000
• 0 <= words.length <= 100
• 1 <= words[i].length <= 1000
• s and words[i] consist of English letters and digits.
• All the values of words are unique.
• See problem on LeetCode.

#### Solution: Window-based method to find all open/close tags, sort by (index, open tag), merge tags with deque

• First find all open and close tags, and sort by index and then open tag first.
• Similar to Meeting Rooms II, merging tags and keep only valid, non-nested tags.
• Apply valid tags on original string.
class Solution:
def addBoldTag(self, s: str, words: List[str]) -> str:
windows = set([len(word) for word in words])  # get unique window length (reduce iteration), O(m)
words = set(words)                            # convert words to hash set for faster search, O(m)
n = len(s)

tag_list = []
for window in windows:                        # for each window size, O(m)
for i in range(n-window+1):               # for each word at this size, O(n)
if s[i:i+window] in words:            # see if each word at current window size is in the set
tag_list.append((i, 1))           # append open tag index, 1 meaning open tag
tag_list.append((i+window, -1))   # append close tag index, -1 meaning close tag

tag_list.sort(key=lambda x: (x[0], -x[1]))    # sort by index & open tag first, O(n*logn)

dq = collections.deque()
cnt = 0
if not cnt:                               # if no open tags, add open tag
cnt += 1
dq.append(idx)
elif cnt == 1 and sign == -1:             # append close tag only when previous tag is open tag (no nested tags)
cnt -= 1
dq.append(idx)
else:                                     # any other cases, only count tags, not append any
cnt += 1 if sign == 1 else -1

ans = ''
open_tag = True
for i, c in enumerate(s):                     # apply tags on original string s, O(n)
if dq and i == dq[0]:                     # add tags
ans += '<b>' if open_tag else '</b>'
open_tag = not open_tag
dq.popleft()                          # remove used tag
ans += c

return ans + '</b>' if dq else ans            # add ending tag if there is one


#### Complexity

• Time: $$O(m*n), m = len(words), n = len(s)$$
• Space: $$O(n)$$

#### Solution: Using RegEx

• Similar idea but using regex and sorted based on len to avoid prefix match first.
from re import escape, finditer

class Solution:
def merge_spans(self, spans: list[list[int]]) -> list[list[int]]:
overlap = lambda a, b: a[0] <= b[1] and a[1] >= b[0]

output = []
for span in spans:
if output and overlap(output[-1], span):
output[-1] = [min(output[-1][0], span[0]), max(output[-1][1], span[1])]
else:
output.append(span)

return output

def addBoldTag(self, s: str, dict: list[str]) -> str:
pattern = r"(?=({}))".format("|".join(map(escape, sorted(dict, key=len, reverse=True))))
spans = self.merge_spans([m.span(1) for m in finditer(pattern, s) if m.group(1)])

for i, span in enumerate(spans):
span = [span[0] + i * 7, span[1] + i * 7]
s = s[: span[0]] + "<b>" + s[span[0] : span[1]] + "</b>" + s[span[1] :]

return s


#### Solution: Trie Tree and Merge Intervals

• Trie tree is used to speed up string match (faster than find or startwith in large query request).
• Using Merge Intervals instead of mask to reduce Time and Space Complexity, both from O(n) to O(m), where m represents interval numbers after merged.
class Solution:
def addBoldTag(self, s: str, dict: 'List[str]') -> str:
trie, n, intervals, res = {}, len(s), [], ""

# create trie tree
for w in dict:
cur = trie
for c in w:
cur = cur.setdefault(c, {})
cur["#"] = 1

# interval merge
if intervals and intervals[-1][1] >= interval[0]:
if intervals[-1][1] < interval[1]:
intervals[-1][1] = interval[1]
else:
intervals.append(interval)

# make max match and add to interval
for i in range(n):
cur, max_end = trie, None
for j in range(i, n):
if s[j] not in cur:
break
cur = cur[s[j]]
if "#" in cur:
max_end = j + 1
# just need to add max-match interval
if max_end: