This series started out life as an explainer on row- vs column-major array implementations, which is a common cause of confusion for people who move from using a language that uses one style (e.g. NumPy's row-major arrays) to a language using the other (e.g. R's column-major arrays).
I knew that for somebody to really understand the difference, they first need to understand how they are implemented, and so I wanted to point people to a good blog upon which I could build.
However, I quickly found out that there is a dearth of information online about the implementation details of multidimensional arrays.
The best I could find was actually the NumPy documentation, but there is not nearly enough information there for someone that doesn't already know how they're implemented to piece it together.
So I instead set out to write my own...
In this post I want to introduce you to the very basics of how multidimensional arrays are implemented, with future posts expanding on more complex topics such as row- vs column-major ordering, strided arrays, and rank-generic algorithms.
I don't think it's controversial to say that multidimensional arrays are the data structure that scientists will encounter most often when programming.
Hell, I don't think it's even that controversial to say that arrays are the most commonly encountered data structure in general!
If you're a Python user then you probably use multidimensional arrays like this:
array = np.array([
[1.1, 2.2, 3.3],
[4.4, 5.5, 6.6],
[7.7, 8.8, 9.9],
])
print(array[2, 0])
# 7.7
If you use R, it'll look more like this:
array <- matrix(
c(
1.1, 2.2, 3.3,
4.4, 5.5, 6.6,
7.7, 8.8, 9.9
),
nrow = 3
)
print(array[[3, 1]])
# [1] 7.7
And in Julia it will look something like this[1]
:
array = [
1.1 2.2 3.3
4.4 5.5 6.6
7.7 8.8 9.9
]
array[3, 1]
And while these three examples look very (somewhat? kinda?) different, under the hood they all work in a very similar way.
Most people don't need to care how they work, these languages all use abstractions that allow you to completely ignore the implementation, but if you want to write fast array code, rank-generic algorithms, or low-level array manipulations then understanding the inner workings become vital.
Nobody can agree on terminology when it comes to arrays, particularly \(1\)D arrays.
You'll variously hear them called "arrays", "vectors", or "lists", but for each of those names there is a vocal group of people who will tell you that that name actually refers to something else.
I will be calling them arrays.
There are two subtleties related to multidimensional arrays that I'm only going to briefly address in this post (though I will cover in a later one): \(0\)- vs \(1\)-indexing, and row- vs column-major ordering.
The former relates to whether the first element in an array is index with \(0\) or \(1\), and the latter relates to how the values of the multidimensional array are ordered in the data array (see below).
In my experience most people think about matrices in a row-major way, so that is how I will frame my explanation, and since row-major arrays tend to go hand-in-hand with \(0\)-indexing (not for any technical reason, that's just how it has been historically) I will be using both row-major and \(0\)-indexed arrays for the remainder of this post.
In particular, for any code snippets, I will use Python with NumPy.
Before we begin in ernest, I want to briefly discuss the two different ways you can implement (dense) multidimensional arrays.
If their language does not include built-in multidimensional arrays then many people's first instinct would be to use Iliffe vectors.
Iliffe vectors are "arrays of arrays" where an \(n\)-dimensional array is implemented as an array of \((n - 1)\)-dimensional arrays, so a \(3\)D array is an array of \(2\)D arrays which are themselves arrays of \(1\)D arrays. In Python (though most high-level languages, such as Julia or R, would be similar) a \(2\)D Iliffe vector would look like this
array = [
[1.1, 2.2, 3.3],
[4.4, 5.5, 6.6],
[7.7, 8.8, 9.9],
]
print(array[2][0])
# 7.7
and a \(3\)D one would be:
array = [
[
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
],
[
[11, 12, 13],
[14, 15, 16],
[17, 18, 19],
],
]
print(array[1][2][0])
# 17
In reality, however, any multidimensional array implementation worth its salt will avoid Iliffe vectors like the plague[2]
, with the exception of some very niche use cases[3]
.
I'll get into exactly why this is in a later post, but the long and short of it is that on modern computers Iliffe vectors are almost always significantly slower than the alternative.
But what is the alternative?
Dope vectors.
And while Dope vectors are significantly more complex, they also have significantly better performance characteristics.
And it's Dope vectors that I want to talk about in this series.
From this point on I'll be using Python for all code examples, as explained in the warning above.
Since Python doesn't have built in multidimensional arrays, you should assume that all Python code has an implicit import numpy as np
at the top.
Also, if you want to play around with this stuff, note that you can see the data array in NumPy using array.flatten()
.
So how are multidimensional arrays implemented in practice?
Well, in terms of \(1\)D arrays of course[4]
!
More specifically they consist of two[5]
\(1\)D arrays: one specifying the size of each dimension and the other specifying the actual data of the array.
That first array, specifying the size of each dimension, is the simpler of the two to understand.
I will refer to it as "the shape array", shape
, or \(\vec{s}\), depending on whether it is used in prose, code, or maths.
Each element of the shape array is just the number of elements in that dimension, for example for matrices the first element is the number of rows and the second element is the number of columns.
So if you have a \(3 \times 2 \times 5 \times 7\) array then \(\vec{s} = (3, 2, 5, 7)\).
The second array, however, the one with the actual data, is a bit more complex, not least because there are actually two common ways of laying it out.
In this post I'm only going to talk about the row-major layout, like you would find in NumPy or C, but I will be exploring column-major layouts and how to convert between them in a later post.
Again, I will refer to this array as "the data array", data
, or \(\vec{d}\), depending on context.
The data array, unsurprisingly, consists of the actual values in your array, but it's order that those values go into the data array that is important.
Specifically elements in the final[6]
dimension that sit within one element of the second-to-last dimension go into the data array next to each other, then elements of the second-to-last dimension that sit within one element of the third-to-last dimension go into the data array next to each other, and so on and so forth.
Now that probably makes no sense whatsoever, so let's exemplify it with a \(2\)-dimensional array (known in some dark corners of the internet as a matrix).
A matrix is indexed by the row number then the column number (i.e. array[row, column]
), so the first dimension refers to rows and the second dimension to columns.
However, matrices have a useful property that makes talking about this easier: they only have two dimensions.
This means that instead of thinking about columns that sit within one row, we can simply say that the first row goes into the data array first, then the second row, etc.
So if your matrix is
\[
A = \begin{pmatrix}2.2 & 5.5 \\
6.6 & 4.4 \\
8.8 & 1.1\end{pmatrix}
\]
then the data array will be
\[
\vec{d} = (2.2, 5.5, 6.6, 4.4, 8.8, 1.1) \,.
\]
So far so good?
Things get a lot harder when we move onto higher dimensional arrays, so I'm gonna try explaining this in two different ways.
I think the easiest way to think about (and certainly the way I think about) data arrays is to imagine a code snippet that turns a multidimensional array into the corresponding data array with nested for
-loops.
The important thing to focus on here is the order of the for
-loops.
We'll start by going over the \(2\)D case again:
array = np.array([
[2.2, 5.5],
[6.6, 4.4],
[8.8, 1.1],
])
data = np.zeros(3 * 2)
p = 0
for i in range(3):
for j in range(2):
data[p] = array[i, j]
p += 1
print(data)
# [2.2 5.5 6.6 4.4 8.8 1.1]
print(array.flatten())
# [2.2 5.5 6.6 4.4 8.8 1.1]
Notice how the order of the for
-loops (for i in ...
, then for j in ...
) matches the order of the indices (array[i, j]
).
This means that we're putting an entire row into the data array before moving onto the next row, hence the name row-major layout.
When we move to a \(3\)-dimensional array we need another for
-loop to account for the extra dimension (called k
in this case).
This new for
-loop goes inside the others, like so:
array = np.array([
[
[5, 2, 7, 1],
[6, 9, 5, 3],
],
[
[1, 5, 0, 4],
[3, 5, 3, 4],
],
[
[1, 5, 0, 9],
[3, 2, 2, 3],
],
])
data = np.zeros(3 * 2 * 4, dtype=int)
p = 0
for i in range(3):
for j in range(2):
for k in range(4):
data[p] = array[i, j, k]
p += 1
print(data)
# [5 2 7 1 6 9 5 3 1 5 0 4 3 5 3 4 1 5 0 9 3 2 2 3]
print(array.flatten())
# [5 2 7 1 6 9 5 3 1 5 0 4 3 5 3 4 1 5 0 9 3 2 2 3]
Because of this explanation, people often think of the order of the data array in terms of which dimension increases fastest.
A more difficult (in my experience) but arguably more useful way to think about the data array is to think about which sections of the data array each index refers to.
Let's again consider the \(2\)D array
\[
A = \begin{pmatrix}2.2 & 5.5 \\
6.6 & 4.4 \\
8.8 & 1.1\end{pmatrix}
\]
indexed first with \(i\) then with \(j\) (i.e. \(A_{i,j}\)).
The data array for this matrix is
\[
\vec{d} = (2.2,\, 5.5,\, 6.6,\, 4.4,\, 8.8,\, 1.1) \,.
\]
Let's mark on here which sections of the data array the \(i\) index refers to:
\[
\vec{d} = (
\underset{i = 0}{\underbrace{2.2,\, 5.5}},\:
\underset{i = 1}{\underbrace{6.6,\, 4.4}},\:
\underset{i = 2}{\underbrace{8.8,\, 1.1}}
) \,.
\]
We can see here that the \(i\) index spans the entire array, but the values of \(i\) do not refer to individual values in the data array (rather they refer to groups of \(2\) values in the data array).
Here's the data array with \(j\) values marked (I've omitted the \(j=\) part for readability) as well as the \(i\) values:
\[
\vec{d} = (
\underset{i = 0}{\underbrace{\underset{0}{2.2},\, \underset{1}{5.5}}},\:
\underset{i = 1}{\underbrace{\underset{0}{6.6},\, \underset{1}{4.4}}},\:
\underset{i = 2}{\underbrace{\underset{0}{8.8},\, \underset{1}{1.1}}}
) \,.
\]
Here we can see that the \(j\) index does not span the entire data array, instead only spanning the section defined by a single \(i\) index value (which I will refer to as an \(i\)-section in a minute), and therefore get repeated each for each value of \(i\).
However, the values of \(j\) do refer to individual values in the data array, although not uniquely (unless a value for \(i\) is specified).
Ok, now let's move on to \(3\)D arrays.
We'll look at the array
\[
\begin{aligned}A_{0,j,k} & = {
\begin{pmatrix}
5 & 2 & 7 & 1 \\
6 & 9 & 5 & 3
\end{pmatrix}
}_{j,k} \\
A_{1,j,k} & = {
\begin{pmatrix}
1 & 5 & 0 & 4 \\
3 & 5 & 3 & 4
\end{pmatrix}
}_{j,k} \\
A_{2,j,k} & = {
\begin{pmatrix}
1 & 5 & 0 & 9 \\
3 & 2 & 2 & 3
\end{pmatrix}
}_{j,k}\end{aligned}
\]
which is just the same \(3\)D array as in the previous section, but written in a more maths-y way.
The data array for this is
\[
\vec{d} = (5,\, 2,\, 7,\, 1,\, 6,\, 9,\, 5,\, 3,\, 1,\, 5,\, 0,\, 3,\, 3,\, 5,\, 3,\, 4,\, 1,\, 5,\, 0,\, 9,\, 3,\, 2,\, 2,\, 3) \,.
\]
Again we'll mark the sections referred by the \(i\) index values first:
\[
\vec{d} = (
\underset{i = 0}{\underbrace{5,\, 2,\, 7,\, 1,\, 6,\, 9,\, 5,\, 3}},\:
\underset{i = 1}{\underbrace{1,\, 5,\, 0,\, 3,\, 3,\, 5,\, 3,\, 4}},\:
\underset{i = 2}{\underbrace{1,\, 5,\, 0,\, 9,\, 3,\, 2,\, 2,\, 3}}
) \,.
\]
We see that the \(i\) index still spans the array and still refers to a group of values, as opposed to a single value, in the data array.
The \(j\) index, on the other hand, does now display significantly different behaviour:
\[
\vec{d} = (
\underset{i = 0}{\underbrace{
\underset{j = 0}{\underbrace{5,\, 2,\, 7,\, 1}},\:
\underset{j = 1}{\underbrace{6,\, 9,\, 5,\, 3}}
}},\;
\underset{i = 1}{\underbrace{
\underset{j = 0}{\underbrace{1,\, 5,\, 0,\, 3}},\:
\underset{j = 1}{\underbrace{3,\, 5,\, 3,\, 4}}
}},\;
\underset{i = 2}{\underbrace{
\underset{j = 0}{\underbrace{1,\, 5,\, 0,\, 9}},\:
\underset{j = 1}{\underbrace{3,\, 2,\, 2,\, 3}}
}}
) \,.
\]
While the \(j\) index still only spans one \(i\)-section, it no longer refers to individual values in the data array.
The \(j\) index now defines it's own \(j\)-section, though much like the values in the \(2\)D case the section defined by a \(j\) value is only unique if paired with a specific \(i\) value.
If we finally add in the \(k\) values (again omitting the \(k =\) part for readability)
\[
\vec{d} = (
\underset{i = 0}{\underbrace{
\underset{j = 0}{\underbrace{\underset{0}{5},\, \underset{1}{2},\, \underset{2}{7},\, \underset{3}{1}}},\:
\underset{j = 1}{\underbrace{\underset{0}{6},\, \underset{1}{9},\, \underset{2}{5},\, \underset{3}{3}}}
}},\;
\underset{i = 1}{\underbrace{
\underset{j = 0}{\underbrace{\underset{0}{1},\, \underset{1}{5},\, \underset{2}{0},\, \underset{3}{3}}},\:
\underset{j = 1}{\underbrace{\underset{0}{3},\, \underset{1}{5},\, \underset{2}{3},\, \underset{3}{4}}}
}},\;
\underset{i = 2}{\underbrace{
\underset{j = 0}{\underbrace{\underset{0}{1},\, \underset{1}{5},\, \underset{2}{0},\, \underset{3}{9}}},\:
\underset{j = 1}{\underbrace{\underset{0}{3},\, \underset{1}{2},\, \underset{2}{2},\, \underset{3}{3}}}
}}
) \,,
\]
we can finally see how the whole thing fits together.
The behaviour of the \(i\) index remains the same in the \(2\)D and \(3\)D cases (spans the entire data array with no repeats, covers a group of values in stead of a single), the behaviour of the \(k\) index in the \(3\)D case takes on the behaviour of the \(j\) index from the \(2\)D case (covers a single value, and spans only sections that are defined by other indices), and the \(j\) index in the \(3\)D case retains its property of spanning a section defined by other indices but loses its property of referencing a single value in the data array.
Now all of that was a lot to take in, so don't worry if you don't understand it at first.
But if you do want to really understand this stuff then there is no substitute for struggling through it yourself, and to that end I strongly encourage you to try to work out what the data arrays for the following two arrays should be.
array = np.zeros((2, 4, 3))
array[:, :, 0] = [
[0.7, 0.9, 0.2, 0.6],
[0.3, 0.0, 0.2, 0.7],
]
array[:, :, 1] = [
[0.1, 0.5, 0.5, 0.2],
[0.0, 0.4, 0.7, 0.8],
]
array[:, :, 2] = [
[0.8, 0.8, 0.1, 0.4],
[0.4, 0.9, 0.8, 0.7],
]
array = np.zeros((2, 4, 3))
array[:, 0, :] = [
[0.4, 0.9, 0.1],
[0.0, 0.4, 0.4],
]
array[:, 1, :] = [
[0.3, 0.9, 0.1],
[0.7, 0.8, 0.1],
]
array[:, 2, :] = [
[0.5, 0.9, 0.4],
[1.0, 0.4, 0.5],
]
array[:, 3, :] = [
[0.2, 0.7, 0.3],
[0.0, 0.9, 0.7],
]
If you're struggling with where to start I would suggest that you try to write out the data arrays of the \(2\)D arrays that we're using to define the \(3\)D ones, then try to work out how the indices of that array map to the indices of the \(3\)D arrays.
Right, I promise that the hardest part is done!
If you managed to get your head around that, then well done!
But if you didn't then don't worry, this is difficult stuff and it's common not to understand until you've had some practice.
Let's ramp down with a slightly less heavy topic.
The Cartesian indices[7]
of an array are the standard way of specifying multidimensional array indices in most modern languages.
If you have a \(8 \times 4 \times 5 \times 6\) array called array
, then when you index it with something like array[3, 0, 3, 2]
you're using Cartesian indices.
However, many languages allow you to index the array as if you were indexing the data array directly, and if they don't then you can always just obtain the data array and index that.
This is referred to as the linear index[8]
and if you index array
(as defined above) with something like array[380]
then you are using a linear index.
So how do you find the linear index from the Cartesian indices?
Well let's go back to our \(3\)D example from the previous section.
As a reminder, the data array of that array was
\[
\vec{d} = (5,\, 2,\, 7,\, 1,\, 6,\, 9,\, 5,\, 3,\, 1,\, 5,\, 0,\, 3,\, 3,\, 5,\, 3,\, 4,\, 1,\, 5,\, 0,\, 9,\, 3,\, 2,\, 2,\, 3) \,,
\]
and it is a \(3 \times 2 \times 4\) array.
We'll look at how you convert the Cartesian index (if you're more used to programming, it would be array[1, 0, 3]
)
\[
\vec{c} = (1, 0, 3)
\]
into the equivalent linear index \(l\).
Here I've reproduced the diagram from the previous section here, but I've coloured the relevant indices in red[9]
,
\[
\vec{d} = (
\underset{i = 0}{\underbrace{
\underset{j = 0}{\underbrace{\underset{0}{5},\, \underset{1}{2},\, \underset{2}{7},\, \underset{3}{1}}},\:
\underset{j = 1}{\underbrace{\underset{0}{6},\, \underset{1}{9},\, \underset{2}{5},\, \underset{3}{3}}}
}},\;
{
\color{red}
\underset{i = 1}{\underbrace{
\underset{j = 0}{\underbrace{
{\color{black}\underset{0}{1},}\,
{\color{black}\underset{1}{5},}\,
{\color{black}\underset{2}{0},}\,
\underset{3}{3}
}},\:
{
\color{black}
\underset{j = 1}{\underbrace{\underset{0}{3},\, \underset{1}{5},\, \underset{2}{3},\, \underset{3}{4}}}
}
}}
},\;
\underset{i = 2}{\underbrace{
\underset{j = 0}{\underbrace{\underset{0}{1},\, \underset{1}{5},\, \underset{2}{0},\, \underset{3}{9}}},\:
\underset{j = 1}{\underbrace{\underset{0}{3},\, \underset{1}{2},\, \underset{2}{2},\, \underset{3}{3}}}
}}
) \,.
\]
We can see that the index is in the second \(i\)-section, the first \(j\)-section of that \(i\)-section, and is then the third \(k\)-element in that \(j\)-section.
This means that to find \(l\) we must add together \(1\) times the size of an \(i\)-section, \(0\) times the size of a \(j\)-section, and \(3\).
But how big is an \(i\)-section?
Well it contains \(2\) \(j\)-sections and each of those \(j\)-sections contain \(4\) elements, so the size is \(2 \times 4 = 8\).
Since \(j = 0\), this means that \(l = 11\).
We can generalise this by thinking about what we actually did there.
We "moved" to the beginning of the \(i\)-th section by adding on \(i\) lots of the size of each \(i\)-section, and the size of each section is the product of the sizes of each section it contains (\(j\) and \(k\) in this example).
Then we did the same for \(j\), moving to the beginning of the \(j\)-th section by adding on \(j\) lots of the size of the \(k\)-section.
And finally for \(k\) by adding on \(k\) lots of \(1\).
We can generalise that to \(n\) dimensions with this formula,
\[
l = \sum_{i = 0}^{n - 1} \left(\prod_{j = i + 1}^{n - 1} s_j\right) c_i \,.
\]
Some languages (pretty much any language with a focus on maths and science), start indexing their arrays at \(1\).
Converting from Cartesian indices to a linear index in these languages requires you to remove \(1\) from each Cartesian index, and add \(1\) to the final linear index,
\[
l = 1 + \sum_{i = 1}^n \left(\prod_{j = i + 1}^n s_j\right) (c_i - 1) \,.
\]
It's also useful to think about how you might do this if you could have a different starting index for each dimension (Fortran, for example, allows you to do this), but I'll leave that one for you to ponder.
Finally we are done for today!
There are still many, many, many topics to cover regarding dense arrays, and even more regarding the various ways you can represent sparse arrays.
This can be a difficult topic to get your head around when you first encounter it, so take some time to mull over it and practice creating some data arrays.
Then, when you've got a handle on these topics, come back here and I'll teach you about column-major arrays, strided arrays, and more!
Footnotes
-
⤶
With fancy output because this site is built with Julia 😇
-
⤶
Except NumPy, which only uses them as a convenient syntax for defining array literals that are then converted into a better representation.
-
⤶
Literally the only use case I can even think of for Iliffe vectors is if you have an array where the size of some dimension varies in an irregular way, and even then it's probably indicative of a suboptimal data layout.
-
⤶
Technically in practice it'll be pointers, not arrays, but I really don't want to have to explain pointers to people... Maybe in a later post.
-
⤶
There will often be a third to account for strided arrays, but that is something I want to cover in a later post.
-
⤶
Remember we're only talking about row-major arrays in this post.
-
⤶
I think this is a term only used by the Julia lot, but I also don't know of any other term for the same thing so ¯\_(ツ)_/¯
-
⤶
Again by the Julia lot, but the best alternative I've managed to find is "indexing with a single index" which is 🤢
-
⤶
Don't read too much into why it's red, it was just the colour that showed up best.