What In The Hell Are Functions

GOTO All What In The Hell Articles

A good way to think about functions is as machines. Machines are designed to repeat a process over and over again. Functions share a lot of characteristics with machines, so let’s look at some of the properties of machines we’re already familiar with.

Machines take input and produce output, let’s take a look at a few examples of that. Your dish washer takes dirty dishes as input and returns clean dishes as output, your washer does the same for clothes. Your radio takes radio waves as input and produces music on your speakers as output.

For the most part you don’t have to know how a machine works, you only need to know how to use it. For the moment we’re going to look at functions the same way, not thinking about how they work, just about how to use them. You’ll be happy to learn that you’re already familiar with a few functions, in fact here’s one you use every day.

Using the function above is simple. We don’t care how it works, whether it is carrying it’s ones or counting on its fingers. The only thing that matters to us is that we give it 2 numbers, 35 and 7 and it gives us back 42.

Here’s an example of another function. This one reverses any string that we give it.

 

Terminology

When we use a function it is referred to as calling that function. So in the above two examples we called the add function and we also called the reverse function.

The input to a function is referred to as the arguments or argument in the case of a single input. Adding arguments to the above we get, we called the add function with the arguments 35 and 7 and we also called the string function with Hello as an argument. Arguments are also commonly referred to as parameters and the two words can be used interchangeably.

The output of a function is most commonly referred to as the return value, though it is referred to as output as well.

 

Calling Functions

Now that we’ve explored some of the properties and terminology of functions let’s look at how they’re represented in code. Just like before, we are not going to worry about how the functions work just yet. We’re only going to call a few.

Function calls can be represented a number of ways, depending on the language you are using. However the vast majority of mainstream languages use this convention.

____________( ______, ______ )

Instead of a fill-in-the-blanks style as above, programmers usually use a shorthand for describing things that can be anything. Unfortunately there is no standardization for this shorthand, so to know what is a shorthand you need to know what words are never used as real names. The use of foo, bar and baz are commonplace and you will have to get accustomed to reading them eventually.

I will take another approach here. I’ll use prototypes to talk about the basic structure and I’ll use descriptive names in any code. This approach is used in many Scheme manuals and I have found it easy to grasp. For now a prototype will be a source code snippet with no line numbering and all the lines highlighted.

Lets start our introductions to prototypes with a prototype of the add function.

Prototype:

name(argument, argument)

Let’s look at the diagram for add and put these labels on it in red.

 

Function Arguments

Functions often take arguments of a specific type. For example the add function above only takes numbers. If you attempt to give it a string it will produce an error. In contrast the reverse function will only accept a string.

To make it easier for you to tell what type of argument to use, the prototypes will state the type.

add(number, number)
reverse(string)
length(argument)

Whenever the type of the argument does not matter or can be of multiple types the word argument will be used.

Let’s take a look at a few more functions and how calling them maps out onto our diagram. The first function we’ll look at is max it will take two numbers as input and give us back the largest one. First I’ll show you the prototype then the code used to call max followed by the diagram.

Prototype:

max(number, number)
max(35, 7)
#===> 35

You probably noticed the #===> symbol in the source code. # is a comment in Python, which is the language used for the psudo-code in this series. A comment is completely disregarded by the computer and is only meant for you to read. This series uses the ===> symbol to show you the output of the code that was just run.

Our next example is length which takes a list or string as input and produces the length of that list or string as output. Because it takes a list or a string, we’ll use argument in our prototype.

Prototype:

length(argument)
length("Hello")

Functions can take two different kinds of arguments too. Here’s a function that takes a string and a number as input and gives us back a letter. Its output is the same as finding the letter by starting at the beginning and counting each letter until you get to number then return that letter.

Prototype:

letter(string, number)
letter("Hello", 2)
#===> e

 

Writing Your Own Functions

I’m going to borrow from SICP here and start with a simple function called squared. This function takes one number as input and returns the result of multiplying that number by itself.

The first thing we’ll do is define what this function should do. You will most likely do this in your head, but here I’ll lay it out like I laid out the functions above. We know we need a number as input and the output should be that number multiplied by itself, so let’s start with a prototype, a fake call, and a diagram like before.

Prototype:

squared(number)
squared(4)
#===> 16

Now that we know what our function should look like when it’s finished, we have to define it. But before we can do that let’s define what a function is a little more precisely.

 

What is a Function

When we were using functions we could think of them as “black boxes” that we put input into and got output out of. Now that we’re writing our own we need to think about how our functions work. Remember that a machine is made to repeat a process over and over. Functions are exactly the same, they also repeat a process or series of processes over and over.

A function is both a set of directions for performing a task and a machine that performs that task.

The syntax for defining a function can vary widely between programming languages. Some languages require that you declare argument and return types. We’ll cover that a little bit later, for now we’re going to focus on the method used by Python.

To define a function in Python you must supply a function name and its arguments. Here’s the prototype of Python’s def form, followed by a definition of squared.

Prototype:

def name(argument, argument, ...):
  # body
def squared(number):
  number * number

There are two main concepts in the above code that we have not covered yet. The body of a function contains the code that describes the process of the function. It is in the body that the instructions for performing the function are held. When we call the function we will use the body, but for now it is just a description of the process.

squared(4)

When we call the function, the programming language looks up the instructions for the function, then replaces all the argument names with the arguments the function is called with. Here’s a simple view of what your programming language sees when you call squared(4).

def squared(4):
  4 * 4

As you can see the number argument is replaced by the 4 argument. Your programming language will do this behind the scenes for you, but you conceptually do this transformation in your head every time you write a function.

However there is one thing wrong with the above function. When we call it, it doesn’t give us back the answer. In fact it seems like it does nothing at all!

 

Returning Values

Most mainstream languages require the use of a return keyword to signal that you want to return a value. Python also requires a return statement so we’ll have to add one in to get the return value.

def squared(number):
  return number * number
squared(4)
#===> 16

 

Combining Functions

Now that we have a function squared that will square a number, let’s look at how to use that in another function.

We’re going to define a new function aoc which will calculate the area of a circle, given a radius as input. The formula for the area of a circle is πr2. We already have a way to square the radius with our squared function, so let’s just write the new function.

def aoc(radius):
  return 3.14 * squared(radius)
aoc(4)
#===> 50.240000000000002

When aoc(4) is called Python looks up the definition of aoc and replaces all occurrences of radius with the number 4. Then Python evaluates the instructions found in aoc. In the instructions Python finds a call to squared so it looks up the instructions to squared and replaces all occurrences of number with the value 4. It then evaluates the instructions in squared and returns 16. squared(4) is then replaced with 16 and Python returns 3.14 * 16 or roughly 50.24.

 

Examples of Functions in Other Languages

Javascript

function aoc(c) {
  var r = c/2;
  return 3.145*(r*r);
}
aoc(8)
//>>> 50.264

Scheme

(define (aoc c)
  (define r (/ c 2))
  (* 3.1415 (* r r)))
(aoc 8)
;>>> 50.264

 

Links

GOTO Table of Contents

Advertisements

What In The Hell Are Conditionals

GOTO All What In The Hell Articles

A conditional is a mechanism for altering the control flow of a program based on the result of a test.  That is a very technical and complicated description of something that is very simple in practice.  Let’s look at a small example before continuing.

if temperature > 80:
  run_air_conditioning()

All humans are used to dealing with conditional statements, in fact we probably compute thousands of conditionals every day without even thinking about them.  They’re so in-grained in our thinking that I didn’t even have to explain the example above, you just got it.

The most basic form of conditional, which is present in almost every language, is if/else.  Each language has slightly different syntax which we’ll get to in a moment, first let’s look at some psudo-code that describes the basic workings of if/else.

if test:
  do_if_test_is_true()
else:
  do_if_test_is_false()

When our programming language comes across this block of code it will first evaluate test.  If the evaluation of test equals true, the do_if_test_is_true() function will be called.  If the evaluation of test equals false, the do_if_test_is_false() function is called.

Note that if test evaluates to true the do_if_test_is_false() function is never called.  This behavior is called branching and a popular way of specifying control flow in a program.

if predicate:
  consequent
else:
  alternate

These are some common terms that refer to the parts of an if/else.  Predicate refers to the expression or statement that is to be tested.  Consequent refers to the action to be taken if the predicate is equal to true.  Alternate refers to the action to be taken if the predicate is equal to false.

You’ve already seen examples from Python (even our psudo-code examples were valid Python), here’s some examples from other languages of if/else.

Javascript

if (2 > 1) 1
//>>> 1

if (1 > 2) {
  1;}
else {
  2;}
//>>> 2

Scheme

(if (> 2 1) 1)
;>>> 1

(if (> 1 2)
    1
    2)
;>>> 2

 

Else If

Another common pattern is if/else if/else.  This pattern is equivalent to chaining if/else expressions together.  Here’s some psudo-code of an if/else if/else in Python and then a diagram of how it is executed.

if temp > 80:
  run_ac()
elif temp < 60:
  run_heat()
else:
  do_nothing()

As you can see, Python first checks if the temperature is greater than 80.  If the temperature is greater than 80, Python will call run_ac() and ignore the rest of the code.  If the temperature is less than 80 it will call the elif branch and check if the temperature is less than 60.  If the temperature is less than 60, Python will call run_heat() and if the temperature is between 60 and 80 Python will call do_nothing().

Instead of using the keyword elif, we could have written our Python code a different way that emphasizes the nesting.

if temp > 80:
  run_ac()
else:
  if temp < 60:
    run_heat()
  else:
    do_nothing()

Writing our program this way makes it clear that each ‘else if’ branch is just another if/else. However keywords like elif exist for a reason and if you are thinking about nesting your if/else expressions, don’t.

You’ve seen elif in Python, now I’ll show you how some other languages handle it.

 

Functions in Other Languages

Javascript

if (temp > 80) {
  1;}
else if (temp < 60) {
  2;}
else {
  3;}

Scheme

(cond ((> temp 80) (run-ac))
      ((< temp 60) (run-heat))
      (else (do-nothing)))

As you can see the syntax can vary quite considerably between languages.  However the fundamental concept of the if/else is the same in most programming languages.

 

Switch

Switch is another form of conditional.  Using the if/else if/else model above is sufficient for a small number of possible conditions.  However as the number of conditions grows the limitations of if/else if/else start to show.

The main problem is that each condition must be checked every time the conditional is evaluated.  If there were 100 conditions each one would have to be checked until one of them was true.  If the condition that returns true is at the end of that list, it could take a lot of time to find it.

Switch statements are usually found in lower-level languages, however I’ll show you how to implement them in higher-level languages as well (should you need them).  A switch in a lower-level language is often implemented as a branch table (or jump table) and can only be done on integers.

Switch in C:

switch(expression_returning_int) {
  case 1:
    printf("Expression returned 1.\n");
    break;
  case 2:
    printf("Expression returned 2.\n);
    /* since there is no break; here it will 'fall through' to default. */
  default:
    printf("Expression returned greater than 1.\n");
    break;
}

In the example above the expression expression_returning_int will be evaluated (it must return an integer).  The value of the expression will then be used as an index into the branch table.  Control is then transfered to the branch referred to by the expression or the default branch (if supplied).

In the example above a break statement was not supplied for the case 2.  If expression_returning_int equals 2 it will print:

Expression returned 2.
Expression returned greater than 1.

The break statement normally stops the execution of that branch.  When it is not present the control flow “falls through” to the next branch, and the next, and the next until it hits a break statement or the end of the switch.  This is a common characteristic of switch statements, however it is not universal.  A switch would still be a switch even if it didn’t allow execution to fall through branches.

 

Switch in Higher-Level Languages

Many high level languages do not provide a switch statement.  However this is rarely a problem as they often do supply data structures which let us represent one easily.

A switch on integers can be done as an array lookup, if you want to learn more about arrays first go to What In The Hell Are Arrays.

If we think about the switch statement above it maps out a little bit like this…

You’ll notice a problem with the above representation.  Where is the ‘default’ branch?  The default branch would be executed by array_lookup if it did not find a value in the array.  The implementation of this will be left as an exercise to the reader as the specifics vary from language to language.

Similarly a switch on strings, or other hash-able values, can be done as a hash lookup.  If you want to learn more about hash tables go to What In The Hell Are Hash Tables.

 

Links

GOTO Table of Contents

What In The Hell Are Hash Tables

GOTO All What In The Hell Articles

A hash table is a combination of two parts; an array, to store the data, and a hash function that computes the index of the array from a key. The best analogy to a hash table is a dictionary. When you use a dictionary you lookup a word (the key) and find the definition (the value).

Hash tables are an improvement over arrays and linked lists for large data sets. With a hash table you don’t have to search through every element in the collection (O(n) access time). Instead you pay a one time cost (the hash function) to retrieve data (O(1) access time).

 

The Hash Function

The purpose of the hash function is to compute an index from a key. That index is then used to store and retrieve data from the array. Many higher-level languages include some type of hash table and hash function.

As you can see from the diagram above the hash(“string”) function transforms a string into an integer which is then used as the index into the array. This index is used to store or retrieve data in the array.

The amount of time it takes to set or retrieve a value from a hash table is the time it takes to run the hash function plus the time it takes to access an element of the array. For small data sets this could take longer than simply iterating over each element of the array. However, since it is a fixed cost, for larger data sets using a hash table could be much faster.

Assuming we have a function set that takes a key to be hashed as its first argument and a value to be stored as its second argument, we can create the following hash table.

As you can see from the above diagram we store both the key and the value in the table. The reason for this is collisions, which will be explained below.

 

Collisions

A collision occurs when two different keys hash to the same value. Collisions will occur in even the best designed hash tables, however there are a few reasons why they can occur more frequently. The first reason is a poor hash function that does not distribute the indices evenly throughout the table, this will be explained in more detail below. The second reason is that the array is too small. If you have a 10 element array and you are trying to store 100 key/values in the hash table, there will obviously be a lot of collisions.

In the diagram above “Puff” hashed to the same value as “Peter” causing a collision. Both “Peter” and “Puff” are now stored in the same index of the array. A certain amount of collisions are unavoidable in a hash table, so we’ll need a way to deal with them.

The simplest way to deal with collisions is to make each element of the array a linked list. Then, when a collision occurs, we simply search through the linked list to find the data we want. This way the hash function gives us a ‘starting point’ to begin our search.

With a good hash function and an appropriately sized array collisions are a rare occurrence. However it is possible, though very unlikely, that all the key/values will be stored in one index and we will end up doing a search on a linked list.

 

What is a good hash function?

The sole job of the hash function is to return a unique array index for each key. This is not always possible and collisions can be expected to occur when the number of elements in the hash table approaches the square root of the size of the hash table’s array. A hash function that does not provide a uniform distribution of keys will cause a lot of collisions, possibly breaking our hash table’s performance down from O(1) to O(n). The only workaround to an ineffective hash function is to make the hash table’s array much larger than the number of elements that it holds, which obviously wastes space.

 

Hash Table Implementations

Python

Python’s equivalent of a hash table is called a dictionary or dict.

# Creating a dictionary
d = {'a':5, 'b':10}

# Adding key/value pairs
d['c'] = 15

# Get value by key
d['c']
#>>> 15

Chicken Scheme

;; Creating a hash-table from an alist
(define d (alist->hash-table '((a . 5) (b . 10))))

;; Adding key/value pairs
(hash-table-set d 'c 15)

;; Get value by key
(hash-table-ref d 'c)
;>>> 15

 

Performance

Indexing: O(1)
Insertion/Removal: O(1)
Search: O(1)

The performance listed above is what is normally found in specific hash table implementations. There are actually a number of different implementations, each with their own advantages, disadvantages and performance guarantees. Specific implementations will have to be presented in future WITH articles.

 

Links

GOTO Table of Contents