Bubble sort

The time necessary for a sort with BubbleSort increases in a quadratic fashion with the number of records. The algorithm works in place and stable. In the best case, when the data records are already sorted, the time increases in a linear fashion with the number of records.

Short Description
The main idea behind BubbleSort is to find the biggest key value and to move that record to the end of the list. This moving is done by comparing the key value of the actual record with the key value of the next record. If the key value of the next record is smaller, the two records are swapped; if the value is bigger, the candidate for having the biggest key value is from then on the next record.

The algorithm works with an outer and an inner loop. After the first repetition of the outer loop the record with the biggest key value is definitely on top of the list. When searching for the second biggest key value, it is not necessary to compare it with the last element, since we already know the result of such a compare; the number of repetitions of the inner loop is decreased by one with every repetition of the outer loop.

Since every record with the actually biggest key value is moved up, it is normally not just one record that moves upwards during a repetition of the outer loop. This is compared with the bubbles in a drink and that's where the algorithm got its name from.

Stability
If there are records with identical key values and they become neighbors during the sort, the upper record becomes the new candidate for having the biggest key value. If the algorithm is realized with a bigger- and not a bigger-or-same- clause, it is stable, because records with identical key value do not change their position relative to each other.

Timing
For an estimate of the runtime we visualize the number of compare operations. In the vertical direction we have the repetitions of the outer loop and in the horizontal direction the number of compare operations.

1 2  3  4  5  6  7  8  9 10   |   compares x x  x  x  x  x  x  x  x  -   |      9 x x  x  x  x  x  x  x  -  -   |      8 x x  x  x  x  x  x  -  -  -   |      7 x x  x  x  x  x  -  -  -  -   |      6 x x  x  x  x  -  -  -  -  -   |      5 x x  x  x  -  -  -  -  -  -   |      4 x x  x  -  -  -  -  -  -  -   |      3 x x  -  -  -  -  -  -  -  -   |      2 x -  -  -  -  -  -  -  -  -   |      1 - -  -  -  -  -  -  -  -  -   |      0

Half of the n*n matrix is filled except for the diagonal; that makes ½ n2 - n compares. Since we don't need the exact number but the rough behavior, we calculate the number of compare operations with


 * NC = ½ n2

Worst Case & Normal Case
In order to get the worst case concerning the swap operations, we assume that the list is sorted in reverse order. Then every compare operation is followed by a swap.


 * NS = ½ n2 

Every compare- and swap-operation needs time. We evaluate the time for a complete sort with


 * TSort = ½ * C * n2 + ½ * S * n2


 * TSort = n2 * (½ C  + ½ * S)


 * TSort = B * n2

where B, C and S are constants and n the number of records to be sorted.

With an even distribution of the key values the calculation of the necessary number of steps becomes a bit complicated and the correct evaluation can be found here. It increases with ¼ n2. The time for a sort will be in average twice as fast as for the worst case. But the timing still follows the quadratic characteristic, only the actual value of B changes.

Termination: The loop terminates at j=i+1. At that point A[i]  A[i+1], A[i]  A[i+2], …, A[i]  A[n]. So, A[i] is the smallest element between A[i] through A[n]. QED

Theorem: BubbleSort correctly sorts the input array A.

Loop invariant for lines 1 through 4 is that A[i]  A[i+1].

Initialization: Starts with A[1].

Maintenance: After the iteration i=p, for some 1<pn, the above lemma for lines 2 through 4 guarantees that A[p] is the smallest element from A[p] through A[n]. So, A[p]  A[p+1].

Termination: The loop 1 through 4 terminates at i=n. At that point A[1]  A[2] . . .  A[n]. Hence, the correctness theorem is proved. QED

Complexity:    i=1n  j=i+1n 1 =  i=1n (n –i) = n2 – n(n+1)/2 = n2/2 – n/2 =  (n2)

Best Case
Now we come to the best case: the records are already sorted. Without any additions to the algorithm we still have the same number of compare operations, but no swaps. This information about "no swaps" can be used to speed up the sort when the records are sorted or nearly sorted. At the beginning of the outer loop we set a marker to NO_SWAPS_DONE. In the inner loop this value is set to SWAPS_DONE in case any records were swapped. At the end of the outer loop we check the value of this marker. If there were no swaps, the records are sorted and the algorithm can terminate. We have a best case behavior of


 * TBest = n * C

Nearly Sorted
Now we have a look at a nearly sorted situation by adding a new record at the end of the list; and the key value is the smallest of all records. The number of compare operations will still be


 * NC = ½ n2

and the number of swaps will be


 * NS = n

Compared with algorithms like InsertionSort the timewise behavior is pretty bad.

Example
In the list to be sorted are 10 records. A blue background indicates that there has to be a swap of data records. A green background indicates, that the upper record becomes the new candidate for the biggest key value.

First repetition of outer loop

Second repetition of outer loop

Third repetition of outer loop

Fourth repetition of outer loop

Fifth repetition of outer loop

Sixth repetition of outer loop

The algorithm terminates after this loop due to the fact that there were no swaps done.