Pattern: Topological Sort

  • Overview:
    • Often, in life and work, the order in which we do things is important. Every task requires completing a few prerequisite tasks first. For example, to eat food, we need to buy groceries, prep the ingredients, cook the food, clean the plates, and finally serve the food on the plates before we can eat it. We can’t prep the ingredients before we buy them. But we can clean the plates first or buy groceries first as they have no tasks that must be done before they can be attempted.
    • This is the essence of the topological sort. We have a list of tasks, where some tasks have a condition that some other tasks must occur before them. Such tasks can be visualized as a graph with directed edges.

    • Here, the nodes in the directed graph represent the tasks, and the directed edges between nodes tell us which task has to be done before the other. With the tasks and dependencies represented as a directed graph, we can use topological sort and find all the possible valid ways to complete a task.
    • Similarly, when we are scheduling jobs or tasks, they may have dependencies. For example, before we finish task a, we have to finish b first. In this case, given a set of tasks and their dependencies, how shall we arrange our schedules? There comes an interesting graph algorithm: Topological Sort.
    • According to Introduction to Algorithms, given a directed acyclic graph (DAG), for every directed edge (u, v), vertex u comes before vertex v in the topologically sorted order. Another way to describe it is that when you put all vertices horizontally on a line, all of the edges are pointing from left to right. Topological sorting is thus a linear ordering defined for vertices of a directed acyclic graph (DAG). This means that topological sorting for a cyclic graph is not possible. Also, there may be multiple valid topological orderings possible for the same DAG as mentioned above.

    • Note that topological sort, or top sort, is defined only for directed acyclic graphs (DAGs).
  • Applications of Topological Sort:
    • Topological sorting is used mainly when tasks or items have to be ordered, where some tasks or items have to occur before others can. For example:
      • Scheduling jobs
      • Course arrangement in educational institutions
      • Compile-time build dependencies
  • Implementations:
    • There are two ways to implement it: Breadth-First-Search (BFS) and Depth-First-Search (DFS).
      • BFS
        • For BFS, we need an array indegree to keep the track of indegrees. Then we will try to output all nodes with 0 indegree, and remove the edges coming out of them at the same time. Besides, remember to put the nodes that become 0 indegree in the queue.
        • Then, we can keep doing this until all nodes are visited. To implement it, we can store the graph in an adjacent list (a hashmap or a dictionary in Python) and a queue to loop. - Pseudocode is demonstrated here: ``` indegree = an array indicating indegrees for each node neighbours = a HashMap recording neighbours of each node queue = [] for i in indegree: if indegree[i] == 0: queue.append(i)

        while !queue.empty(): node = queue.dequeue() for neighbour in neighbours[node]: indegree[neighbour] -= 1 if indegree[neighbour] == 0: queue.append(neighbour) ```

        • The following figure illustrates topological Sort with BFS:
      • DFS
        • The key observation is that, leaf nodes should always come after their parents and ancestors. Following this intuition we can apply DFS and output nodes from leaves to the root.
        • We need to implement a boolean array visited so that visited[i] indicates if we have visited vertex i. For each unvisited node, we would first mark it as visited and call DFS() to start searching its neighbors. After finishing this, we can insert it to the front of a list. After visiting all nodes, we can return that list.
          def topological_sort():
              for each node:
                  if visited[node] is False:
                      dfs(node)
        
          def dfs(node):
              visited[node] = True
              for nei in neighbours[node]:
                  dfs(node)
              if visited(node) = false:
                  ret.insert_at_the _front(node)
        
        • The following figure illustrates topological Sort with DFS:
  • Kahn’s Algorithm for Topological Sorting:
    • Kahn’s algorithm takes a directed acyclic graph and sorts its nodes in topological order. It works by keeping track of the number of incoming edges into each node, also called in-degree. It repeatedly:
      1. Finds nodes with no incoming edge, that is, nodes with zero in-degree (no dependency)
      2. Stores the nodes with zero in-degree
      3. Removes all outgoing edges from the nodes with zero in-degree
      4. Decrements the in-degree for each node after a connected edge removal
        • (Steps 3 and 4 can be performed simultaneously.)
    • This process is done until no element with zero in-degree can be found. This can happen when the topological sorting is complete and also when a cycle is encountered. Therefore, the algorithm checks if the result of topological sorting contains the same number of elements that were supposed to be sorted:
      • If the numbers match, then no cycle was encountered, and we print the sorted order.
      • If a cycle was encountered, the topological sorting was not possible. This is conveyed by returning null or printing a message.
    • Example:
      • Here, “In” = indegree
      • Order of removal: First 0, then 1 and 2, then 3, and finally, 4 and 5.
      • Topological order: 0,1,2,3,4,5
        • Note: This is not the only possible topological order. For example, 0,1,3,4,5,2 is also a valid topological order.
  • Why Topological Sort Does Not Work for Cyclic Graphs:
    • If we had no clothes to go out, and we needed to go out to get clothes, we would be in a cycle where one task depends on the other, and no task independent of a prerequisite exists. Therefore, we can’t begin!
    • This is why topological sorting doesn’t work for cyclic graphs. In any graph where there’s a cycle, there comes a time where there’s no node left that is devoid of dependencies.

  • Topological Sort Example Using Kahn’s Algorithm:
    • To understand the process more concretely, let’s take an example:
        Edges: { 0, 1 }, { 0, 2 }, { 1, 3}, {1, 4), { 3, 4}, { 3, 5 }
      

    • We now calculate the indegree of each node. The indegree of a node pictorially refers to the number of edges pointing towards that node. The direction of the arrow is incoming, hence the term indegree.
    • In terms of edges, each edge \({U_i, V}\) where \(U_i\) is the source and \(V\) is the destination node contributes to the indegree of its destination node V as the edge points towards it. We’re going to update the indegree value for any node \(V\) every time an edge associated with it as its destination node is deleted.

      • Node 0 has zero incoming edges. In = 0
      • Nodes 1,2,3, and 5 have one incoming edge. In = 1
      • Node 4 has two incoming edges. In = 2
    • We can now begin applying Kahn’s algorithm for topological sorting.
      • Overall the process is as follows:
        • In this example, we start with the node with in-degree zero, remove that node and its edges, add it to our sorted list.
        • Then, we decrement the in-degrees of the nodes it was connected to and repeat all these steps until all elements are sorted. Note that there may exist more than one valid topological ordering in an acyclic graph, as is the case in this example as well.
      • Step-by-step process:
        • Step 1: The indegree of node 0 is zero. This is the only node with indegree zero at the beginning. We remove this node and its outward edges: {0, 1}, {0, 2}
        • Step 2: We update the indegree of the deleted edges’ destination nodes. Directed edges contribute to the indegree of their destination nodes, as they are pointing towards them. Now the indegree of nodes 1 and 2 both are decremented by one. Since the indegree before this decrement for both nodes was 1, this also makes node 1 and node 2 the new nodes with indegree zero. We now remove these two nodes. Node 2 has no outgoing edges. Node 1 has two outgoing edges, so we also remove those: {1, 3}, {1, 4}
        • Step 3: We now decrement the indegrees of destination nodes of these edges 3 and 4. Hence the indegree of node 4 goes from 2 to 1, and the indegree of node 3 goes from 1 to 0. This makes node 3 the new node to remove. We also remove its outgoing edges: {3, 4},{3, 5}
        • Step 4: We update the indegrees of the destination nodes of the deleted edges. So the indegrees of node 4 and node 5, which were both 1 before the decrement, now become zero. We are now left with only these two nodes: node 4 and node 5, both with indegree zero and no outgoing edges from either node. When multiple nodes have indegree zero at the same time, we can write either one before the other.
        • Step 5: There may be multiple topological orderings possible in an acyclic graph, as is the case in this example as well. We repeat this process till all nodes are sorted and output one of these valid topological orderings. A valid topological ordering we end up with here is 0,1,2,3,4,5.
  • Topological Sort Algorithm:
    • Now that we’ve looked at Kahn’s Algorithm for topological sorting in detail with an example, we can condense it as the topological sort algorithm:
      • Step 0: Find indegree for all nodes.
      • Step 1: Identify a node with no incoming edges.
      • Step 2: Remove the node from the graph and add it to the ordering.
      • Step 3: Remove the node’s out-going edges from the graph.
      • Step 4: Decrement the in-degree where connected edges were deleted.
      • Step 5: Repeat Steps 1 to 4 till there are no nodes left with zero in-degree.
      • Step 6: Check if all elements are present in the sorted order.
      • Step 7: If the result of Step 6 is true, we have the sorted order. Else no topological ordering exists.
      • Step 8: Exit.

[207/Medium] Course Schedule

Problem

  • A trie (pronounced as “try”) or prefix tree is a tree data structure used to efficiently store and retrieve keys in a dataset of strings. There are various applications of this data structure, such as autocomplete and spellchecker.

  • Implement the Trie class:

    • Trie() Initializes the trie object.
    • void insert(String word) Inserts the string word into the trie.
    • boolean search(String word) Returns true if the string word is in the trie (i.e., was inserted before), and false otherwise.
    • boolean startsWith(String prefix) Returns true if there is a previously inserted string word that has the prefix prefix, and false otherwise.
  • Example 1:

Input
["Trie", "insert", "search", "search", "startsWith", "insert", "search"]
[[], ["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]]
Output
[null, null, true, false, true, null, true]

Explanation
Trie trie = new Trie();
trie.insert("apple");
trie.search("apple");   // return True
trie.search("app");     // return False
trie.startsWith("app"); // return True
trie.insert("app");
trie.search("app");     // return True
  • Constraints:
    • 1 <= word.length, prefix.length <= 2000
    • word and prefix consist only of lowercase English letters.
    • At most 3 * 104 calls in total will be made to insert, search, and startsWith.
  • See problem on LeetCode.

Solution: Dictionary without a TrieNode class

  • There is no need to create a TrieNode class. Using a dictionary can give much better performance. Nodes are replaced by dictionary. is_word is replaced by adding a - key to the dictionary.
class Trie(object):
    def __init__(self):
        self.trie = {}

    def insert(self, word):
        t = self.trie
        for c in word:
            if c not in t: t[c] = {}
            t = t[c]
        t["-"] = True

    def search(self, word):
        t = self.trie
        for c in word:
            if c not in t: return False
            t = t[c]
        return "-" in t

    def startsWith(self, prefix):
        t = self.trie
        for c in prefix:
            if c not in t: return False
            t = t[c]
        return True   
Complexity
  • Time: \(O(n)\)
  • Space: \(O(1)\)

Solution: Using a TrieNode class

  • Using a dictionary to implement a TrieNode class:
class TrieNode:
    def __init__(self):
        self.childNodes = {}
        self.isWordEnd = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        """
        we will start with root node and check for the first character
        if there is already a node present for the character we will fetch that node, else we will create a new node
        then we will iterate to next character
        """
        currNode = self.root
        for ch in word:
            node = currNode.childNodes.get(ch, TrieNode())
            currNode.childNodes[ch] = node
            currNode = node

        # after all the characters are traversed, mark the last node as end of word
        currNode.isWordEnd = True

    def search(self, word: str) -> bool:
        """
        we will start from root node and check for all the characters
        if we could not find a node for a character we will return False there
        once we iterate through all character we will check if that is a word end
        """

        currNode = self.root
        for ch in word:
            node = currNode.childNodes.get(ch) # or if ch not in currNode.childNodes:
            if not node:                       #        return False
                return False
            currNode = node

        return currNode.isWordEnd

    def startsWith(self, prefix: str) -> bool:
        """
        starts with is similar to search except here we don't have to check whether the last character is end of word
        """
        currNode = self.root
        for ch in prefix:
            node = currNode.childNodes.get(ch)
            if not node:
                return False
            currNode = node

        return True
  • Using a defaultdict to implement a TrieNode class:
from collections import defaultdict

class TrieNode(object):
    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.nodes = defaultdict(TrieNode) # Easy to insert new node.
        self.isword = False  # True for the end of the trie.

class Trie(object):
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        """
        Inserts a word into the trie by creating nodes with individual chars.
        :type word: str
        :rtype: void
        """
        curr = self.root
        for char in word:
            curr = curr.nodes[char]
        curr.isword = True

    def search(self, word):
        """
        Returns True if the word is in the trie.
        :type word: str
        :rtype: bool
        """
        curr = self.root
        for char in word:
            if char not in curr.nodes:
                return False
            curr = curr.nodes[char]
        return curr.isword

    def startsWith(self, prefix):
        """
        Returns if there is any word in the trie
        that starts with the given prefix.
        :type prefix: str
        :rtype: bool
        """
        curr = self.root
        for char in prefix:
            if char not in curr.nodes:
                return False
            curr = curr.nodes[char]
        return True
Complexity
  • Time: \(O(n)\)
  • Space: \(O(1)\)