What In The Hell Are Linked Lists

GOTO All What In The Hell Articles

This is the first in a series of articles describing programming concepts. This series assumes you have some basic experience in programming, but attempts to describe these concepts as simplistically as possible. This series will mainly use Python as the language of choice for examples. This is not because of any perceived superiority of Python over other languages, rather Python is basically executable psudo-code so we can express the concepts simply and give you a chance to play with them inside an interpreter.

 

Singly-Linked Lists

A linked list is one of the simplest data structures, at its heart it is simply a two element node (or cell). The first slot contains the data and the second slot contains a pointer to the next node. A simple illustration demonstrates this concept more concretely…

The graphic above represents the list (1, 2, 3). Each node in the list has two parts, data and pointer. The last node has a special symbol ‘Nil’ to tell both us and the computer that we have reached the end of the list. The python code for each node in the list is below…


class Node:

  def __init__(self, data=None, next=None):
    self.data = data
    self.next  = next

  def __repr__(self):
    return repr(self.data)

To make a list out of these nodes we would need to make a node for each element in the list (1, 2 and 3) and link them together.

a = Node(1, Node(2, Node(3, None)))
print a
#>>> 1
print a.next
#>>> 2
print a.next.next
#>>> 3
print a.next.next.next
#>>> None

Notice the nesting of the calls to Node above. Linked lists are an inherently nested data structure. To construct the list without having to nest the calls to Node we would have to work backwards. We have to do this because if we try to create the first node (1) there is not currently a second node to which we can point to.

c = Node(3, None)
b = Node(2, c)
a = Node(1, b)

The graphic below shows another way to look at Nodes that reflects their nested nature.

Programming languages like Lisp and Scheme make heavy use of linked lists. Consequently they have a rich library of functions for dealing with them. The code below shows how you create a linked list in Scheme using the ‘cons’ function. ‘cons’ is short for construct and the nodes are called ‘cons cells’.


(define a (cons 1 (cons 2 (cons 3 '()))))
a
;>>> (1 2 3)

Now that we have created a list with cons (note its similarity to the Python example of Node) we can use the built-in procedures car and cdr to access elements of that list.


(car a)
;>>> 1
(cdr a)
;>>> (2 3)
(car (cdr a))
;>>> 2
(cdr (cdr a))
;>>> (3)
(car (cdr (cdr a)))
;>>> 3
(cdr (cdr (cdr a)))
;>>> ()

Most Scheme implementations usually define shorthand functions for performing these operations, like cadr and caddr (and many more). Many also provide more descriptive names such as first, rest, second, third…


(cadr a)
;>>> 2
(caddr a)
;>>> 3
(cddr a)
;>>> (3)
(first a)
;>>> 1
(rest a)
;>>> (2 3)
(second a)
;>>> 2
(third a)
;>>> 3

Constructing linked lists and accessing their elements is usually most elegantly accomplished using recursion. The Scheme code below shows how we would recursively construct a list of integers, we have to do it backward to avoid a call to (reverse).

(define (make-list-of-integers start stop)
  (let loop ((s stop) (aList '()))
    (if (< s start)
        aList
        (loop (- s 1) (cons s aList)))))

(make-list-of-integers 0 5)
;>>> (0 1 2 3 4 5)

 

Circular Linked Lists

A circular list is a linked-list where the last element’s pointer is set to the first element of the list. The graphic below shows the nodes of a circular list.

Using the Node class that we defined above we can create a circular list in Python as follows…


a = Node(1)
a.next = Node(2, Node(3, a))
a.next
#>>> 2
a.next.next
#>>> 3
a.next.next.next
#>>> 1
a.next.next.next.next
#>>> 2

The Scheme code for defining a circular list is below…

(define a (cons 1 (cons 2 (cons 3 '()))))
(set-cdr! (cddr a) (car a))
a
;>>> (1 2 3 . 1)

 

Doubly-Linked Lists

Usually the term linked list refers to singly-linked lists as we have implemented them above. However there are other ways of implementing linked lists as well. The next most common form of linked list is the doubly-linked list. This is very similar to singly-linked lists except that each node in the list also contains a pointer to the previous node as shown in the graphic below.

Below is the Python code to implement doubly-linked lists.

class dNode:
  def __init__(self, prev=None, data=None, next=None):
    self.prev = prev
    self.data = data
    self.next = next

  def __repr__(self):
    return str(self.data)

a = dNode(data=1)
b = dNode(data=2)
c = dNode(data=3)
d = dNode(data=4)
a.next = b
b.prev, b.next = a, c
c.prev, c.next = b, d
d.prev = c

a
#>>> 1
a.next
#>>> 2
a.next.next
#>>> 3
a.next.next.prev
#>>> 2
a.next.next.next.prev
#>>> 3

To make the doubly-linked list easier to use we would have to add some helper functions. As you can see it is fairly difficult to create a doubly-linked list, so we’ll define an append function here that will make creating one a little easier.

def dAppend(dLinkedList, nodeToAppend):
  if dLinkedList.next == None:
    dLinkedList.next = nodeToAppend
    nodeToAppend.prev = dLinkedList
  else:
    dAppend(dLinkedList.next, nodeToAppend)

a = dNode(data=1)
dAppend(a, dNode(data=2)
dAppend(a, dNode(data=3)
a
#>>> 1
a.next
#>>> 2
a.next.next
#>>> 3
a.next.next.prev
#>>> 2
a.next.prev
#>>> 1

The simplest way to achieve this is through a recursive function as defined above. However Python does not perform tail-call optimization and will either run out of memory or reach its recursion limit (default is 1000) on large lists. Below is a non-recursive version of the same function.

def dNodeAppend(dLinkedList, nodeToAppend):
  node = dLinkedList
  while True:
    if node.next == None:
      node.next = nodeToAppend
      nodeToAppend.prev = node
      break
    else:
      node = node.next

a = dNode(data=1)
dNodeAppend(a, dNode(data=2))
dNodeAppend(a, dNode(data=3))
dNodeAppend(a, dNode(data=4))
a
#>>> 1
a.next
#>>> 2
a.next.next
#>>> 3
a.next.next.next
#>>> 4
a.next.next.next.prev
#>>> 3
a.next.next.prev
#>>> 2
a.next.prev
#>>> 1

 

Performance

Insertion/Removal at head: O(1)
Insertion/Removal at n: O(n)
Indexing: O(n)
Search (unsorted): O(n)

Links

How to Think Like A Computer Scientist – Chapter 17 – Linked Lists
This chapter describes a way of implementing linked lists in Python. It also covers circular lists.
http://greenteapress.com/thinkpython/html/chap17.html

An Introduction to Scheme and its Implementation – Lists
http://www-pu.informatik.uni-tuebingen.de/users/sperber/pfg-2001/scheme/schintro-v14/schintro_93.html

Wikipedia Article on Linked Lists
http://en.wikipedia.org/wiki/Linked_list

GOTO Table of Contents

Advertisements

What In The Hell Are Arrays

GOTO All What In The Hell Articles

Arrays are one of the two fundamental data structures in computer science (the other is the tree).  An array is a contiguous (uninterrupted or sequential) block of memory that stores a set of elements.

Elements in an array can be accessed by their index.  An array’s index usually starts at 0 and stops at the number of elements in the array minus 1. The diagram below shows an array of the characters in the string “HELLO”.

In the array above the element at index 0 is the character “H”, the element at index 1 is the character “E” and so on.

Arrays have constant time access to the elements stored within. This means that it takes the same amount of time to access any element. For example, it would take the same amount of time to access the last element of an array as it would the first, or even an element in the middle.

A multi-dimensional array is simply an array of arrays. A spreadsheet is a form of multi-dimensional array. The diagram above shows a 3 element array where each element is also a 3 element array. Multi-dimensional arrays can be used for a variety of tasks from tracking pieces on a chess board to drawing graphics on the screen.

A “true” multi-dimensional array is implemented as a contiguous block of memory, just like the regular array. However it is also common practice to implement multi-dimensional arrays as trees.

 

Code

C

An array in C can contain elements of only one type/size and is not resizable.

/* Create an array */
int a[3] = {5, 10, 15};

/* Access an array element */
a[1] /*==> 10 */

Python

An array in Python is called a list. It can contain elements of any type/size and is resizable.


# Create an array
a = [5, 10, 15]

# Access an array element

a[1]  #==>  10

Scheme

An array in Scheme is called a vector. It can contain elements of any type/size and may be resizable.


;; Create an array

(define a (vector 5 10 15))
(define b #(5 10 15))

;; Access an array element

(vector-ref a 1)  ;;==>  10

 

What is the index?

An index is a way of labeling elements in an array. The first element is usually located at index 0, the second at index 1, the third at 2 and so on. The index is used to calculate the address of the element.

To understand this we have to remember that an array is just a reserved block of memory. Let’s look at a simple diagram of how the memory in your computer is organized.

Each block of memory can hold one value (usually equal to an integer in a language like C). To create an array we reserve a block of memory big enough to hold all our elements. If we had 4 elements it would look like the diagram below.

To access our array we only need to keep around the address of the beginning of our reserved space or 102 in this example. This address is equal to the index 0. But our index values still don’t point us at the right addresses yet. To do that we multiply the index by the size of an element then add that value to the start of our array. Let’s see how that maps out on our array.

Sometimes elements in an array are larger than just one block of memory. Here is what an array looks like when each element is two blocks of memory large.

An array can only contain elements of one size. This is because we need to multiply the index by the size of the elements in the array. How then do some languages let you put elements in an array that are different sizes? One simple way to do that is to use the size of the largest element in the array for indexing. This leads to some wasted space when storing smaller elements but allows the array to work just like before.

How do we know when we’re at the end of the array? One way is to store the length of the array with the address to the first element. In that case we just check if the index is <= the length. Another way to signal the end of an array is a sentinel node. A sentinel node is a value that can only occur at the end of an array and signals the end.

 

Performance

Indexing: O(1)
Insertion/Removal: O(n)
Search: O(n)

Arrays are either static or dynamic. A static array is set at a fixed size, elements can be inserted and removed but the array cannot grow or shrink. A dynamic array, on the other hand, can grow or shrink. Dynamic arrays are often initialized to be larger than the set of elements it contains. When enough elements are added to the array, such that the number of elements is greater than the array size, the array must be resized. Resizing an array could be a costly operation if the array is large or the act of copying is slow.

 

GOTO Table of Contents