Content of Trie root

Hello,

Im a little confused with the Trie datastructure atm. From Figure 1 in the project documentation, it seems as if all trie instances share the same root, which seems to be empty in the sense of not containing a char. Only at “depth 1” the nodes start to contain chars. Is this the case?

image
The nodes do not contain chars. The arrows indicate which character a node has.

Both views are correct:

One can think of the edges as containing letters. In this case, a path from the root to a leaf/marked node is a word in the dictionary/trie.

But one can also think about the trie as nodes containing letters as shown in the lecture nodes.
In this case, the root is an empty node.

Screenshot_2022-05-26_20-19-00

One has to distinguish between the abstract theoretical concept of the datastructure and the concrete implementation in C code.
Both of these views of a trie could have the same implementation.

2 Likes

In this project, the recommended way is to have for each node some list of child node. Viewed from that node, there is (up to) one child node for each character.

This corresponds to labeling edges.

We also recommend you allocate a new node for each subtree (i.e. you do not do fold subtrees), each subtree is uniquely determined by the path taken to reach it. That way, each node corresponds to a string.

In the example from the project description, you would label like this:

image

Now, when implementing the data structure in C, this node labeling is implicit. That is, your node structure does not have an actual field that tells you the string corresponding to each node. Instead, the string represented by a node is only ever “computed” locally when walking through the trie. In practice, this occurs nowhere in your C code.

As mentioned by @Marcel.Ullrich, while this labeling exists in theory and on paper, your implementation in C becomes simpler and more efficient when forgetting about this.

Edit: to make this clear, the string represented by the root note is the empty string "". Computer scientists sometimes call this \varepsilon, but I did not in the above picture.

Fun fact: You can even avoid saving the characters explicitly, if the alphabet is finite :slight_smile:

2 Likes