Distilled • LeetCode • Trie
- Pattern: Trie/Prefix Tree
Pattern: Trie/Prefix Tree
[208/Medium] Implement Trie (Prefix Tree)
Problem
-
A trie (pronounced as “try”) or prefix tree is a tree data structure used to efficiently store and retrieve keys in a dataset of strings. There are various applications of this data structure, such as autocomplete and spellchecker.
-
Implement the Trie class:
Trie()
Initializes the trie object.void insert(String word)
Inserts the string word into the trie.boolean search(String word)
Returns true if the string word is in the trie (i.e., was inserted before), and false otherwise.boolean startsWith(String prefix)
Returns true if there is a previously inserted string word that has the prefix prefix, and false otherwise.
-
Example 1:
Input
["Trie", "insert", "search", "search", "startsWith", "insert", "search"]
[[], ["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]]
Output
[null, null, true, false, true, null, true]
Explanation
Trie trie = new Trie();
trie.insert("apple");
trie.search("apple"); // return True
trie.search("app"); // return False
trie.startsWith("app"); // return True
trie.insert("app");
trie.search("app"); // return True
- Constraints:
1 <= word.length, prefix.length <= 2000
word and prefix consist only of lowercase English letters.
At most 3 * 104 calls in total will be made to insert, search, and startsWith.
- See problem on LeetCode.
Solution: Dictionary without a TrieNode
class
- There is no need to create a
TrieNode
class. Using a dictionary can give much better performance. Nodes are replaced by dictionary.is_word
is replaced by adding a-
key to the dictionary.
class Trie(object):
def __init__(self):
self.trie = {}
def insert(self, word):
t = self.trie
for c in word:
if c not in t: t[c] = {}
t = t[c]
t["-"] = True
def search(self, word):
t = self.trie
for c in word:
if c not in t: return False
t = t[c]
return "-" in t
def startsWith(self, prefix):
t = self.trie
for c in prefix:
if c not in t: return False
t = t[c]
return True
Complexity
- Time: \(O(n)\)
- Space: \(O(1)\)
Solution: Using a TrieNode
class
- Using a dictionary to implement a
TrieNode
class:
class TrieNode:
def __init__(self):
self.childNodes = {}
self.isWordEnd = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
"""
we will start with root node and check for the first character
if there is already a node present for the character we will fetch that node, else we will create a new node
then we will iterate to next character
"""
currNode = self.root
for ch in word:
node = currNode.childNodes.get(ch, TrieNode())
currNode.childNodes[ch] = node
currNode = node
# after all the characters are traversed, mark the last node as end of word
currNode.isWordEnd = True
def search(self, word: str) -> bool:
"""
we will start from root node and check for all the characters
if we could not find a node for a character we will return False there
once we iterate through all character we will check if that is a word end
"""
currNode = self.root
for ch in word:
node = currNode.childNodes.get(ch) # or if ch not in currNode.childNodes:
if not node: # return False
return False
currNode = node
return currNode.isWordEnd
def startsWith(self, prefix: str) -> bool:
"""
starts with is similar to search except here we don't have to check whether the last character is end of word
"""
currNode = self.root
for ch in prefix:
node = currNode.childNodes.get(ch)
if not node:
return False
currNode = node
return True
- Using a
defaultdict
to implement aTrieNode
class:
from collections import defaultdict
class TrieNode(object):
def __init__(self):
"""
Initialize your data structure here.
"""
self.nodes = defaultdict(TrieNode) # Easy to insert new node.
self.isword = False # True for the end of the trie.
class Trie(object):
def __init__(self):
self.root = TrieNode()
def insert(self, word):
"""
Inserts a word into the trie by creating nodes with individual chars.
:type word: str
:rtype: void
"""
curr = self.root
for char in word:
curr = curr.nodes[char]
curr.isword = True
def search(self, word):
"""
Returns True if the word is in the trie.
:type word: str
:rtype: bool
"""
curr = self.root
for char in word:
if char not in curr.nodes:
return False
curr = curr.nodes[char]
return curr.isword
def startsWith(self, prefix):
"""
Returns if there is any word in the trie
that starts with the given prefix.
:type prefix: str
:rtype: bool
"""
curr = self.root
for char in prefix:
if char not in curr.nodes:
return False
curr = curr.nodes[char]
return True
Complexity
- Time: \(O(n)\)
- Space: \(O(1)\)
[211/Medium] Design Add and Search Words Data Structure
Problem
-
Design a data structure that supports adding new words and finding if a string matches any previously added string.
-
Implement the WordDictionary class:
WordDictionary()
Initializes the object.void addWord(word)
Adds word to the data structure, it can be matched later.bool search(word)
Returns true if there is any string in the data structure that matches word or false otherwise. word may contain dots ‘.’ where dots can be matched with any letter.
-
Example:
Input
["WordDictionary","addWord","addWord","addWord","search","search","search","search"]
[[],["bad"],["dad"],["mad"],["pad"],["bad"],[".ad"],["b.."]]
Output
[null,null,null,null,false,true,true,true]
Explanation
WordDictionary wordDictionary = new WordDictionary();
wordDictionary.addWord("bad");
wordDictionary.addWord("dad");
wordDictionary.addWord("mad");
wordDictionary.search("pad"); // return False
wordDictionary.search("bad"); // return True
wordDictionary.search(".ad"); // return True
wordDictionary.search("b.."); // return True
- Constraints:
1 <= word.length <= 25
word in addWord consists of lowercase English letters.
word in search consist of '.' or lowercase English letters.
There will be at most 3 dots in word for search queries.
At most 104 calls will be made to addWord and search.
- See problem on LeetCode.
Solution: RegEx
- With dots representing any letter, this problem is basically begging for a regular expression solution :)
class WordDictionary:
def __init__(self):
# use "#" as a delimiter for the beginning and end of the word
self.words = '#'
def addWord(self, word):
self.words += word + '#'
def search(self, word):
return bool(re.search('#' + word + '#', self.words))
- A different approach using
defaultdict
:
class WordDictionary:
def __init__(self):
self.words = collections.defaultdict(set)
def addWord(self, word):
# index by the length of the word
self.words[len(word)].add(word)
def search(self, word):
return any(map(re.compile(word).match,
self.words[len(word)]))
Solution: Trie
- Using
TrieNode
class implemented using adefaultdict
:
class TrieNode():
def __init__(self):
self.children = collections.defaultdict(TrieNode)
self.isWord = False
class WordDictionary(object):
def __init__(self):
self.root = TrieNode()
def addWord(self, word):
node = self.root
for w in word:
node = node.children[w]
node.isWord = True
def search(self, word):
node = self.root
self.res = False
self.dfs(node, word)
return self.res
def dfs(self, node, word):
if not word:
if node.isWord:
self.res = True
return
if word[0] == ".":
for n in node.children.values():
self.dfs(n, word[1:])
else:
node = node.children.get(word[0])
if not node:
return
self.dfs(node, word[1:])
- Using
TrieNode
class implemented using a dictionary. Also, saving index in a stack instead of string slicing:
class WordNode:
def __init__(self):
self.children = {}
self.isEnd = False
class WordDictionary:
def __init__(self):
self.root = WordNode()
def addWord(self, word):
node = self.root
for w in word:
if w in node.children:
node = node.children[w]
else:
node.children[w] = WordNode()
node = node.children[w]
node.isEnd = True
def search(self, word):
stack = [(self.root, word)]
while stack:
node, w = stack.pop()
if not w:
if node.isEnd:
return True
elif w[0] == '.':
for n in node.children.values():
stack.append((n, w[1:]))
else:
if w[0] in node.children:
n = node.children[w[0]]
stack.append((n, w[1:]))
return False
- Same approach; rephrased:
class WordDictionary:
def __init__(self):
self.trie = {}
def addWord(self, word: str) -> None:
node = self.trie
for c in word:
node = node.setdefault(c, {})
node['$'] = True
def search(self, word: str) -> bool:
return self.searchNode(self.trie, word)
def searchNode(self, node, word: str) -> bool:
for i, c in enumerate(word):
if c == '.':
return any(self.searchNode(node[w], word[i+1:]) for w in node if w != '$')
if c not in node: return False
node = node[c]
return '$' in node
- The question is expecting us to design a data structure that supports adding new words and finding if a string matches any previously added string. So, to design a data structure we only need to support three function’s for this data structure:
- One is the constructor that’s gonna initialize the object.
- One is adding the word’s, but all of them are lower case from
a
toz
. - One is searching a word, and the word can contains any character from
a
toz
. But there is one additional character it contains, “.” character (the “.” character is a wild card, can match any character easily in the string).
- The brute force way to solve this problem is pretty simple, “having a list of words and then just for every search we would just see, if this search match any of the word in input list.” But it’s not an efficient way, especially in terms of the space complexity involved (storing whole strings and not exploiting the face that common prefixes across strings do not need to be stored again). An efficient way to solve this problem. And for that we require Trie data structure a.k.a. Prefix tree.
- Let’s understand Trie first. Trie is basically a tree where each node holds a character and can have up to
26
children in the word add-search context, because we have 26 lower case characters in the alphabet (froma
toz
). So, basically each node represents a single character. Graphically speaking:
- If we insert the word
abc
in our trie,a
will haveb
as a child, which in turn will havec
as a child. Graphically, this is how it looks like:
-
Note that in the above example,
c
denotes the end of the word for the wordabc
. Now, if we added another word example, sayab
, we don’t adda
andb
back again since they are already available to us. We can simply re-use these characters. So, we have two words along this pathabc
andab
. Basically, all words that start witha
andb
would live in this space, that’s what makes it efficient. Hence, a trie is also called a prefix tree. -
Now let’s take an example and build our tree.
Input
["WordDictionary","addWord","addWord","addWord","search","search","search","search"]
[[],["bad"],["dad"],["mad"],["pad"],["bad"],[".ad"],["b.."]]
Output
[null,null,null,null,false,true,true,true]
- The first word we add is
bad
, i.e.,b
->a
->d
. -
Next, let’s add another word
dad
. Given that we don’t share the same prefix as the earlier wordbad
(bad
starts withb
, whiledad
starts withd
), we have to start with a different path. So, let’s adddad
asd
->a
->d
. - We have one last word before we start searching, say
mad
. So, we don’t havem
, then let’s add it:m
->a
->d
- So far we have three word’s and all of them end with a different
d
. But all three of them have different prefixes that’s why they are along different path’s:
-
Now let’s look at the searching path:
- The first word we gonna search for is
pad
. We start at the root and and begin looking for the first character in the word, i.e.,p
. We immediately returnFalse
asp
doesn’t even exist in our input. - Now, let’s search for another word
bad
. We start at the root and see there are any nodes withb
. Yes, we haveb
. Now let’s check doesb
have a childa
, which it does. For the last characterd
, let’s check ifa
has a childd
, and surely it does! Lastly we have to say the last characterd
is in our Trie, which designated as the end of the word (marked red in the diagram below). Therefore, we returnTrue
for the inputbad
. - Let’s search for
.ad
next. The dot.
character matches any character. We start at the root and go through all possible paths using a DFS or backtracking approach. So, let’s say we start with going downb
as the first node. Check ifb
has a childa
, which it does. Now, check ifa
has a childd
, which it also does. Lastly, note that the last characterd
in our Trie is designated as the end of the word. Therefore, we returnTrue
for this input.ad
. - Let’s try one last search, looking for
b..
. So, we start at the root and see there are any nodes withb
. Yes, we haveb
. Now, check if we have a character underb
to match.
, which we have asa
. Now, we’re looking for a character for our last dot.
. Yes we haved
and it is the end of the word. Therefore, we returnTrue
for this inputb..
.
- The first word we gonna search for is
-
Here’s a visual demonstration for the overall process:
Solution: Trie using TrieNode
class
class WordDictionary:
def __init__(self):
# Initialize your data structure here.
self.children = [None]*26
self.isEndOfWord = False
def addWord(self, word: str) -> None:
# Adds a word into the data structure.
curr = self
for c in word:
if curr.children[ord(c) - ord('a')] == None:
curr.children[ord(c) - ord('a')] = WordDictionary()
curr = curr.children[ord(c) - ord('a')]
curr.isEndOfWord = True;
def search(self, word: str) -> bool:
# Returns if the word is in the data structure. A word could contain the dot character '.' to represent any one letter.
curr = self
for i in range(len(word)):
c = word[i]
if c == '.':
for ch in curr.children:
if ch != None and ch.search(word[i+1:]): return True
return False
if curr.children[ord(c) - ord('a')] == None: return False
curr = curr.children[ord(c) - ord('a')]
return curr != None and curr.isEndOfWord
Complexity
- Time: \(O(m)\) for well defined words, but in the worse case \(O(m \cdot 26^n)\) where \(m\) is the total queries and \(n\) being the length of the input string
- To understand the worst case time complexity, consider the input
.......
and the trie is almost complete till a deep enough level. In this case, chances are you have to iterate over 26 character for each node. That boils down the time complexity to be \(O(m \cdot 26^n)\)
- To understand the worst case time complexity, consider the input
- Space: \(O(1)\) for well defined words, but for worst case \(O(m)\)
[616/758/Medium] Add Bold Tag in String
Problem
-
You are given a string s and an array of strings words. You should add a closed pair of bold tag
<b>
and</b>
to wrap the substrings ins
that exist in words. If two such substrings overlap, you should wrap them together with only one pair of closed bold-tag. If two substrings wrapped by bold tags are consecutive, you should combine them. -
Return
s
after adding the bold tags. -
Example 1:
Input: s = "abcxyz123", words = ["abc","123"]
Output: "<b>abc</b>xyz<b>123</b>"
- Example 2:
Input: s = "aaabbcc", words = ["aaa","aab","bc"]
Output: "<b>aaabbc</b>c"
- Constraints:
1 <= s.length <= 1000
0 <= words.length <= 100
1 <= words[i].length <= 1000
s and words[i] consist of English letters and digits.
All the values of words are unique.
- See problem on LeetCode.
Solution: Window-based method to find all open/close tags, sort by (index, open tag), merge tags with deque
- First find all open and close tags, and sort by index and then open tag first.
- Similar to Meeting Rooms II, merging tags and keep only valid, non-nested tags.
- Apply valid tags on original string.
class Solution:
def addBoldTag(self, s: str, words: List[str]) -> str:
windows = set([len(word) for word in words]) # get unique window length (reduce iteration), O(m)
words = set(words) # convert `words` to hash set for faster search, O(m)
n = len(s)
tag_list = []
for window in windows: # for each window size, O(m)
for i in range(n-window+1): # for each word at this size, O(n)
if s[i:i+window] in words: # see if each word at current window size is in the set
tag_list.append((i, 1)) # append open tag index, 1 meaning open tag
tag_list.append((i+window, -1)) # append close tag index, -1 meaning close tag
tag_list.sort(key=lambda x: (x[0], -x[1])) # sort by index & open tag first, O(n*logn)
dq = collections.deque()
cnt = 0
for idx, sign in tag_list: # merging tags, O(n)
if not cnt: # if no open tags, add open tag
cnt += 1
dq.append(idx)
elif cnt == 1 and sign == -1: # append close tag only when previous tag is open tag (no nested tags)
cnt -= 1
dq.append(idx)
else: # any other cases, only count tags, not append any
cnt += 1 if sign == 1 else -1
ans = ''
open_tag = True
for i, c in enumerate(s): # apply tags on original string `s`, O(n)
if dq and i == dq[0]: # add tags
ans += '<b>' if open_tag else '</b>'
open_tag = not open_tag
dq.popleft() # remove used tag
ans += c
return ans + '</b>' if dq else ans # add ending tag if there is one
Complexity
- Time: \(O(m*n), m = len(words), n = len(s)\)
- Space: \(O(n)\)
Solution: Using RegEx
- Similar idea but using regex and sorted based on
len
to avoid prefix match first.
from re import escape, finditer
class Solution:
def merge_spans(self, spans: list[list[int]]) -> list[list[int]]:
overlap = lambda a, b: a[0] <= b[1] and a[1] >= b[0]
output = []
for span in spans:
if output and overlap(output[-1], span):
output[-1] = [min(output[-1][0], span[0]), max(output[-1][1], span[1])]
else:
output.append(span)
return output
def addBoldTag(self, s: str, dict: list[str]) -> str:
pattern = r"(?=({}))".format("|".join(map(escape, sorted(dict, key=len, reverse=True))))
spans = self.merge_spans([m.span(1) for m in finditer(pattern, s) if m.group(1)])
for i, span in enumerate(spans):
span = [span[0] + i * 7, span[1] + i * 7]
s = s[: span[0]] + "<b>" + s[span[0] : span[1]] + "</b>" + s[span[1] :]
return s
Solution: Trie Tree and Merge Intervals
- Trie tree is used to speed up string match (faster than
find
orstartwith
in large query request). - Using Merge Intervals instead of mask to reduce Time and Space Complexity, both from
O(n)
toO(m)
, wherem
represents interval numbers after merged.
class Solution:
def addBoldTag(self, s: str, dict: 'List[str]') -> str:
trie, n, intervals, res = {}, len(s), [], ""
# create trie tree
for w in dict:
cur = trie
for c in w:
cur = cur.setdefault(c, {})
cur["#"] = 1
# interval merge
def add_interval(interval):
if intervals and intervals[-1][1] >= interval[0]:
if intervals[-1][1] < interval[1]:
intervals[-1][1] = interval[1]
else:
intervals.append(interval)
# make max match and add to interval
for i in range(n):
cur, max_end = trie, None
for j in range(i, n):
if s[j] not in cur:
break
cur = cur[s[j]]
if "#" in cur:
max_end = j + 1
# just need to add max-match interval
if max_end:
add_interval([i, max_end])
# concat result
res, prev_end = "", 0
for start, end in intervals:
res += s[prev_end:start] + '<b>' + s[start:end] + "</b>"
prev_end = end
return res + s[prev_end:]