Introduction to Algorithm Analysis (4): Dynamic Arrays
After having learned about several sorting algorithms and utilizing best-, worst- and average case analysis to gain a better understanding of their performance characteristics, we now branch out into the area of data structures and observe how their state and information stored within interacts with our attempts to analyze the runtime of various operations.
Whilst a previous article already introduced linked lists, for this one we shall take one further step back and look at a seemingly much simpler data structure: the dynamic array.
Recap: What is an array?
In its most standard and simple form an array is nothing more than a contiguous slice of memory. If you have made it this far in our series, you most surely know this already, but it bears repeating and spelling out explicitly and in great detail, as it will become very relevant in a few moments.
The key word to look out for in this description is contiguous.
Whilst that detail is rarely visible from a high level user perspective, it underlies everything that makes an array an array. Other containers - like the aforementioned linked list - represent basically the same thing - just an ordered collection of variables sharing the same type. Yet, the chosen trade-offs could hardly be more different:
Contrary to a linked list, which provides O(1) insertion and deletion at arbitrary known positions, at the cost of additional pointers, fragmentation and expensively having to walk the list to get to a desired value, an array optimizes element access and offers a cache friendly linear data layout with optimal memory footprint.
This is accomplished by one very simple property: putting the elements next to each other in memory.
Such neat, orderly arrangement ensures we won't need any additional bookkeeping. All you need to know to work with it is the beginning, the size of one element and the number of elements, which is constant overhead, regardless of how many we store.[1]
As there are no gaps, accessing the nth Element turns into a trivial computation: we know there are exactly n variables between the first and the nth, each of a known size. Therefore, the nth element can be found at
address_of_first + sizeof_element * n
Iterating over all elements from beginning to end walks linearly and very predictably through memory, a boon for prefetching
All of that is largely hidden, but it dictates performance characteristics and is what makes arrays such an attractive base for many more complex, diverse and widely used data structures. Therefore it is important to understand, that for an array to be an array and exhibit its desirable properties, all elements must lie - without any gaps - one after another in memory
Dynamic Array
And that does come with a cost once we try to extend it.
A normal array is static, its size fixed at construction. If we know upfront how many variables we will need, that is perfectly fine, but often we do not and our needs change at runtime based on user input or other events outside our control. For such use cases, it would be beneficial if our structure could dynamically grow or shrink as needed whilst retaining its most useful attributes. That is exactly what a dynamic array provides.
The only functionality needed to implement it - and often the only one we have - is a way to request exclusive access to a contiguous region of memory large enough to store a given number of elements, i.e. a memory allocator. Ideally, we would also like the ability to extend that region later on, but even so some languages and runtimes attempt to support such operations, we cannot rely on them always working. In fact, they cannot.
Imagine this simplified situation, with a limited address space capable of holding 16 values, addressable from 0 to F:
[----------------] <-Memory
[0123456789ABCDEF] <-Addresses
If we now request space for an array of size 4, the allocator might reserve the following area:
[aaaa------------] <-Memory
[0123456789ABCDEF] <-Addresses
So far, so good. Now assume some other part of our program also requires some memory. Not much, just a single variable. A simple allocator now might do something like this:
[aaaab-----------] <-Memory
[0123456789ABCDEF] <-Addresses
Still, perfectly fine. As a perceptive reader, you might already suspect the problem this poses. Assume we now order our allocator to extend the array to hold one additional element. What is it supposed to do? All it could do would be akin to the following:
[aaaaba----------] <-Memory
[0123456789ABCDEF] <-Addresses
With a list it doesn't really matter where our new element appears, such fragmentation would be tolerable. The last pointer could simply be modified to point to address 5 and all is fine. For an array, however, that does not suffice. Remember what we learned in the previous section: It is imperative that there are no gaps between elements!
So here is what we have to do instead:
First, request a new area of larger size, copy all elements to this new array and free the old one:
[----baaaa?------] <-Memory
[0123456789ABCDEF] <-Addresses
Afterwards, we can safely write our new entry:
[----baaaaa------] <-Memory
[0123456789ABCDEF] <-Addresses
As always, here is simplified pseudocode describing the operation:
new_array = allocate(size_of_old_array + 1)
for i=0 to size_of_old_array-1
new_array[i] = array[i]
new_array[size_of_old_array] = new_element
free(array)
array = new_array
size_of_old_array = size_of_old_array + 1
Analysis
Having constructed this beautiful operation, let's, once again, analyze how we did on the performance front:
Execution of any push operation depends on:
- The time taken for one allocate and one free
- The time taken to copy all previously inserted elements
1. depends very much on the choice of allocator, which is - at least for now - not within our control. As such, we shall ignore it and assume constant time.
2. on the other hand is very predictable. A copy is executed exactly once for every previously pushed value.
As such, the running time for the nth push should be O(n), i.e. it is linear in the size of elements. A simple measurement supports this:
Technically, that is a lie, as the bits required to store this pointer and size depends on the maximum number of elements we could store. As the maximum amount of memory we can address is limited and most of the time a pointer is exactly big enough to address all of it, we tend to ignore this minor detail. Even if we did not ignore it, however, the overhead is O(log n), unlike lists O(n)(or O(n*log n) with the same consideration) ↩︎
Great! Or rather not. Appending a single element to a list of 1 million does 1 million copies! Quite costly! Can we do better and if so, how?
A first idea might be to simply ask for a tiny bit more memory than we actually need so we don't have to copy on every push:
if size_of_old_array==capacity:
new_array = allocate(capacity + 1000)
for i=0 to size_of_old_array-1
new_array[i] = array[i]
free(array)
array = new_array
capacity = capacity + 1000
array[size_of_old_array] = new_element
size_of_old_array = size_of_old_array + 1
Not much of an improvement, is it? We might have gaps between our execution time spikes, but the gaps remain constant and increasingly insignificant compared to the relative explosion of those spikes. But wait! That does remind us of something, doesn't it? We learned about this: Constant factors do not matter! Even if we ignore them and assume those gaps do no work at all, every 1000th execution would still take O(n) time, so we should end up with about n/1000 copies on average, which still grows linearly. Even with an added 10000, 100000 or 1000000 per allocation, asymptotically, we have gained nothing!
It follows, that to get an actual improvement, the factors themselves should depend on input size. The required code change is minimal, the result looking superficially similar:
if size_of_old_array==capacity:
new_array = allocate(2*size_of_old_array)
for i=0 to size_of_old_array-1
new_array[i] = array[i]
free(array)
array = new_array
capacity = 2*size_of_old_array
array[size_of_old_array] = new_element
size_of_old_array = size_of_old_array + 1
The effect, however, is not:
Amortizing
This clearly looks better! Is it? Can we prove it is?
We could analyze the worst case again, but it turns out, that doesn't really help. It looks exactly equivalent in all three implementations, it just happens progressively less frequently: every time, every 1000 times, after each doubling.
Alternatively, we could try to average over all possible cases, like we did for Quicksort, but that is hard to apply here and of limited meaning. What exactly would we average over? All n to infinity? Would that represent anything of practical relevance?
We might be stuck and need to, once again, extend our repertoire. Please have another look at the plot above. Notice how the gaps seem exactly proportional to the size of the preceding spikes? That is by design and we can use it!
If we have to extend our array after n elements, the number of elements copied is exactly n. Our resizing scheme demands we double the space available, meaning the next resize operation will only happen after another n elements are pushed. Adding the costs of one extension and all the pushes that follow before the next yields: (O(n) + nO(1))/n = 2O(1) = O(1)
And that, my friends, is the essence of amortized analysis. In case it is unfamiliar, here is how the word is defined by Wiktionary:
1. The reduction of loan principal over a series of payments.
2. The distribution of the cost of an intangible asset, such as an intellectual property right, over the projected useful life of the asset.
2. describes exactly what we do. We pay a cost upfront by doing more work than strictly necessary and then distribute that cost over the subsequent operations. Once the balance is depleted, we repeat the process.
The balance is free, usable space and is reduced by one with each push. Looking at just the upfront payment - the preallocation - complexity might seem linear, but when we group operations and also consider all those that got cheaper due to it, the picture appears less bleak.
The cost of pushing onto a dynamic array is O(1) amortized.
Conclusion
Asymptotic analysis is not restricted to stateless algorithms but can also provide insights into data structures and repeated operations on them. We learned how a dynamic array works internally and saw how our understanding of formal analysis can guide us in choosing a good implementation strategy. In so doing, we discovered amortized analysis, which can provide further insight into how work is distributed between repeated invocations
In the next article of this series, we will leverage our knowledge to derive, understand and analyze Heapsort, yet another comparison based sorting algorithm. Unlike the previously studied ones, it has theoretically optimal asymptotic behavior.