CS50 Week5 Data Structure

De_
7 min readMay 2, 2022
CS50 Week5 Data Structure

Remember array is elements stored back to back? The size of an array is often predefined. If you want to resize it afterwards, chances are there might be some garbage values stored next to it and you can’t just plop in a new value. Then what is the solution?

Let’s say the original array is of size 3 and you want to add a new value. What you can do is get a new chunk of memory with size of 4 and then copy the original array, finally put the new value in.

The time complexity is O(n) since you need to execute n times copy and pasting into an array. As the original array gets bigger and bigger, the time to execute those steps would be much longer.

Linked List

Linked list is dynamic, easy to grow and shrink the size of it without having to touch and move all the original data from one location to another.

linked list

Each element in a linked list stores two things, one is the value it holds, the second one is the pointer to the next element. The last element does not store any pointer, so the value of it is null, which means the absence of an address

We can think of linked list as a list of numbers that are linked togerther.

The links are implemented by way of addresses or pointers. The linked list is just a collection of nodes, each rectangle in the above image represents a node.

With linked list, we save the time doing the copy pasting of values that an array uses to resize. But the tradeoff is that we need extra space to store pointers.

Let’s create the structure of a linked list:

How to create a linked list with codes?

Every node pointing to another

How much time does it take to search in a linked list?

Searching in a linked list takes O(n) steps since you always need to start from the first node and then follow up to the second one, unlike arrays where you can jump to any locations easily.

How much time does it take to insert into a linked list?

If we sacrifice order, it would take us constant time to insert into a linked list since you only need to change the pointer from an old address to a new one.

Now let’s look at resizing in both arrays and linked list

using realloc() to resize an array

realloc(address *p, size int) hands in back the memory address by you already asked by using malloc() and also resize it to suit your needs. It also copies the value in the old memory address and paste to the new one.

How about inserting a new node in the middle of a linked list?

First, we need to update the new node’s next field. This way, we won’t orphan nodes in the original linked list. (In this example, 2,4 and 5 won’t get orphaned.)

Insert a new node in the middle
node->next = list;

Next, update the list to point at the new node.

update the pointer
list = n;

Tree

Tree

Tree is a recursive data structure. Every node is a tree itself. This data structure implements dynamism by using nodes. Also, it preserved an important order for binary search by making sure that right child is always more, left child is always less. Its downside is taking more space to store pointers.

define a node in tree:

Searching in a tree is a recursive process:

How much time does it take to insert an element into a tree?

O(log(n)).

Insert into a tree

Say we want to insert 8 in the above tree. We start from the root, and go to the right sub tree, the right sub tree and then find a place to insert. It turns out that the height of the tree is the maximum time we spend looking for a right place to insert. In this case, it’s log8, which is 3.

The height of a binary search tree is log(n).

Let’s look at a more perverse example:

tree with only right children

If we have a tree with 1 be the root and 2, 3 be the following elements, we will end up being like this, kind of like a linked list. In this case, it takes O(n) time to search. How can we solve this problem?

The answer is pretty simple, just make 2 the new root and a balanced tree shows up. It might take some time to ensure you creating a balanced tree in your program. But as your data gets larger, the time you spent on creating a balanced tree would indeed benefit you more.

balanced tree

Hash table

Hash table is an array of linked list. Say we record everyone’s name by storing these names in alphabetical order. Now every letter has its own linked list, just like a chain.

Hash table

How do we search in a hash table?

First, we need a hash function that helps us find the right linked list to start our search.

input ⇒ hash function ⇒ output

For example, if we want to find Bill, by giving Bill as an input to the hash function and gets 2 as the output, we know Bill could only exist in the second linked list. This saves us a ton of time looking up from the first linked list to the last one.

As you can see from the above picture, some of the names takes longer times to be found, like names starting with H would take 3 steps. As names starting with H gets bigger, it would be not so time effieicent to search for them.

To solve this issue, you can look at more letters when storing. For example, instead of only having the H bucket, you can store Ha, Haa or so separately, avoiding collision.

Although this solution could save you time searching, it also takes up huge memory.

Inserting more buckets

A hash table helps you to bucketize your inputs and get access to data more quickly.

How much time does it take to search in a hash table?

O(n), take a deck of cards for instance, if we sort them according to suits in advance, the time it takes to find a card in them gets shortened from 52 steps to 13 steps. When it comes to theory, it is still O(n) just like searching in a linked list. But in the real world, it is indeed faster than searching in a linked list.

Trie

A trie is a tree, each of whose nodes is an array. And each of the arrays is an array of pointers to other nodes.

In this picture, each node is a 26-sized array. We store each letter in one node and it would point to the next letter in another node. By following the next pointer, we are able to know if a name exist. No matter how many names are stored, it takes constant time to search.

Trie

It can attain the ultimate goal of time complexity, which is O(1). But it takes too much memory. In the above example, we only store three names, but it takes more than 100 pointers to store some null pointers.

Abstract data structure

A mental structure you can imagine, implementing some real world problems, that’s implemented with some other data structures.

queue

It has a famous property called FIFO, first in, first out.

  • enqueue: get into a queue
  • dequeue: get out of a queue

The idea of queue in code can be implemented using either array or linked list, these lower level implementation details.

stack

It has a property called LIFO, last in, first out.

  • push: add element into a stack
  • pop: remove element from a stack

dictionary

A dictionary allows you to associate keys with values.

--

--

De_

Who dares, wins. | Backend Engineer | Voracious Reader | Dog Person | Open to challenge