How do you sort a long list of numbers? Or, more accurately, how do you instruct a computer to? This question has compelled computer scientists from as early as the 1950s, and dozens upon dozens of algorithms have been created, each with pros and cons. These days, high-level programming languages mostly abstract this away — in JavaScript, it’s as easy as calling array.sort(), and no thought needs to be given to how the sorting should actually be done. This is a very good thing for the most part, but it’s interesting — and often mesmerizing — to watch particular sorting algorithms do their job, so for this applet, I’ve implemented eleven very different ones. The list to be sorted is represented as a color wheel — presing the Sort button first shuffles it, then sorts it with the method of choice, and then verifies that it has been correctly sorted.

Every algorithm is explained in more detail below, but there are a few general things to mention beforehand. When comparing the speed of algorithms, we need a measurement of time that doesn’t depend on the power of the computer running it. Because of this, we typically don’t use actual time units, like seconds, but instead something called big-O notation. If a sorting algorithm is $O(n^2)$, then the time it takes to run increases with the square of the amount of data given to it. We may not know how long it will take to sort a list of length 500, but it will take about 4 times as long to sort a list that’s twice as long. An algorithm that’s $O(n)$, on the other hand, will only take about twice the time.

Whenever a sorting algorithm modifies the list, that location in the color wheel (i.e. a thin band) is highlighted, and if it’s enabled, a sound is played with pitch based on the number written. All of the algorithms are normalized to take about 20 seconds to complete, regardless of how fast they actually take, so to compare them, look at the number of reads and writes.

Probably the simplest possible sorting algorithm, bubble sort looks at every adjacent pair of entries and swaps them if they’re in the wrong order. It repeats this until it longer sees any pairs it can swap. It runs in $O(n^2)$ time and requires an horribly large number of both reads and writes, making it a worse choice than almost any other algorithm on this list. Its only advantage is how incredibly easy it is to implement.

Insertion sort is a step up from bubble sort in a few ways, but it still runs in $O(n^2)$ time. It loops through the list only one time, and slides each entry backward until it’s in the correct place among the already-sorted entries. Unfortunately, doing that requires every larger entry to shift forward one place, making the number of writes to the list skyrocket.

Selection sort is the exact opposite to insertion sort in a sense. It loops through the entire list, finds the smallest entry, and swaps it with the first entry. Then it loops through the list beginning at the second entry, finds the smallest remaining entry, swaps it with the second entry, and so on. Where insertion sort knows what entry it wants to move but not where it needs to go, selection sort knows where its destination is but not which entry it needs to move there. It therefore requires drastically fewer writes but many more reads than insertion sort. It still runs in $O(n^2)$ time — there’s room for improvement.

Heapsort is the first algorithm on our list that runs in $O(n\log(n))$ time. If you aren’t familiar with logarithms or haven’t seen them in a while, think of $\log(n)$ as being the exponent of the power of $2$ that’s roughly equal to length of the list — for example, if the list is length 500, $\log(500) \approx 9$, since $2^9 = 512$. It goes without saying that an $O(n\log(n))$ algorithm is much faster than an $O(n^2)$ one, and is almost always the better choice. Interestingly, any sorting algorithm that needs to compare entries of the list to see which is larger cannot ever be faster than $O(n\log(n))$. Heapsort functions almost identically to selection sort, but it first arranges its entries into a data structure called a heap. The only requirements for a heap are that every entry is associated with two entries further ahead than it, called its children, and that those children are smaller than it. Since every entry has two children, there are roughly $\log(n)$ levels of the heap, and since every is larger than both its children, the top entry is the largest in the list. We can then remove that top entry and place it at the end of the list, and rearrange the heap to figure out what the new topmost entry should be. It turns out that we only need to move one entry per level to do that, so only $\log(n)$ total. In this sense, selecting the maximum entry only requires looking at $\log(n)$ entries, instead of $n$ as in selection sort.

Merge sort is another algorithm that runs in $O(n\log(n))$ time, and it’s the first recursive algorithm we’ve seen. Recursive algorithms work by splitting the list into pieces, breaking those into pieces, and so on, until eventually the pieces are so simple that sorting them is easy. Then they work backward, building the pieces back into a sorted list. Here, every entry is separated from every other, and then the first pair is sorted, followed by the second pair, and so on. Then the first two pairs are interlaced into a sorted length-4 sequence, along with the second two pairs, and the rest. The length-4 sequences are interlaced to create half as many length-8 sequences, and so on until the entire list is sorted. In keeping with the rest of the algorithms, this sort is done in-place, meaning it doesn’t use extra arrays — this requires some very careful logic that uses the unsorted parts of the array as working memory to merge the sorted parts.

Quicksort is one of the most commonly-used algorithms in practice. It’s similar to merge sort, but rather than splitting the list in half every time, it first chooses the entry halfway through the list and places smaller entries before it and larger ones after it, resulting in two unevenly-sized blocks. Each of those are then partitioned based on their middle entries in the same way, and the process is repeated until every block is sorted. Like the previous two, it runs in $O(n\log(n))$ time.

Shellsort is a variant of insertion sort that drastically reduces the distance entries need to move. It begins by taking a large gap size — say 100 — and partitioning the array into 100 smaller arrays whose elements are each 100 apart in the original. It then runs insertion sort on each one, and then the gap size is reduced and insertion sort is run again, and we repeat the process until the gap size is 1 and we’re just running insertion sort itself. Sorting the larger gap sizes first results in much less shorter distances for insertion sort on the smaller gaps, and the algorithm is such an improvement that it runs in $O(n\log(n))$ time.

With most of the common algorithms out of the way, we can look at some stranger ones. Where most algorithms try to minimize the total number of operations, cycle sort only concerns itself with writing to the list as little as possible — in fact, it only writes once to each location, with the correct entry each time. The idea is that we know exactly where any entry should go — if there are 102 entries smaller than $x$, then $x$ should be the 103rd entry in the sorted list. So we just put it there and remove whatever was in its place, and repeat the process with that entry. This continues until we get back to where we started, at which point we find the next entry we haven’t moved yet. Since the algorithm needs to read the entire list every time it moves an entry, it takes $O(n^2)$ time, but for memory devices whose performance degrades quickly when written to, this can be an acceptable tradeoff.

Where insertion sort is the way most people tend to sort money in a wallet or playing cards in a hand, radix sort is the way you’d probably go about sorting words into a dictionary. First put them into 26 different buckets based on their first letter, and then for each bucket, split the words into 26 sub-buckets based on their second letter. Here, we only need two buckets at each level instead of 26, since we sort based on digits in the binary representation of the entries, which are either 0 or 1. MSD stands for most significant digit — i.e. we sort first based on the first digit, not the last. With $n$ entries to sort and $\log(n)$ binary digits to consider, the algorithm takes $O(n\log(n))$ time. It’s also the first algorithm in the list that’s not a comparision sort, since entries are never compared against one another, just put into buckets.

This is the same algorithm as the previous one, except it makes the strange decision to sort based on the last (least significant) digit first. What results is a progression that appears to be less and less sorted as time goes on, until the final step when everything falls into place like magic. It also runs in $O(n\log(n))$ time.

Possibly the strangest algorithm on the list, gravity sort simulates placing the list entries as rows of beads on a vertical abacus, and then letting them fall. The resulting rows at the bottom will be the sorted list. If this is somehow physically simulated, then it will run in $O(\sqrt{n})$ time, since the $n$ rows will all fall at once, and the time for a falling object to hit the ground is proportional to the square root of its starting height. This would crush every other sorting algorithm, but it unfortunately isn’t possible to implement in a computer — the best we can do is moving every bead in a row in a single operation by parallelizing the process, which makes it $O(n)$. The non-parallel version is implemented here, which moves one bead at a time and takes $O(n^2)$ time. Sadly, even with the parallel version, the algorithm requires $O(n^2)$ memory space to store the square abacus, and this is prohibitively expensive for large lists.

0 reads

0 writes