Nested Arrays (Continued)
Contents
10. Nested Arrays (Continued)#
10.1. First Contact#
10.1.1. Definitions#
We have already met nested arrays in the chapter about Data and Variables; let us just remind ourselves of some definitions:
An array is said to be generalised or nested when one or more of its items are not simple scalars, but scalars containing “enclosed” arrays (this term will be explained soon).
Such an array can be created in many ways, although until now we have only covered the simplest one, called vector notation, or strand notation. Using this notation, the items of an array are just juxtaposed, and each item can be identified as a separate item because:
it is separated from its neighbours by blanks, or
it is embedded within quotes, or
it is an expression embedded within parentheses, or
it is a variable name, or the name of a niladic function which returns a result.
Just to demonstrate how it works, we will create a nested vector and a nested matrix:
one ← 2 2⍴8 6 2 4
two ← 'Hello'
nesVec ← 87 24 'John' 51 (78 45 23) 85 one 69
]display nesVec
nesMat ← 2 3⍴'Dyalog' 44 two 27 one (2 3⍴1 2 0 0 0 5)
]display nesMat
Later, we will provide a more formal description of this notation.
10.1.2. Enclose & Disclose#
It seems so easy to create and work with nested arrays; couldn’t we turn a simple array into a nested array by, for example, replacing one item of a simple matrix with a vector?
For example, we create a simple matrix:
⎕← mat ← 2 3⍴87 63 52 74 11 62
Then, we try to change it into a nested array:
mat[1;2] ← 10 20 30
LENGTH ERROR
mat[1;2]←10 20 30
∧
It doesn’t work!
We cannot replace one item with an array of three items.
mat[1;2]
is a scalar.
We can only replace it with a scalar.
10.1.2.1. Enclose#
Let us now use a little trick to make the assignment above work.
We just have to zip up the three values into a single “bag”, using a function called enclose, represented by the symbol ⊂
, typed with APL+z.
Then we will be able to replace one item by one bag!
mat[1;2] ← ⊂10 20 30
mat
Now it works!
We can, of course, do the same with character data, but we now know that an expression like
mat[2;3] ← 2 4⍴'JohnPete'
LENGTH ERROR
mat[2;3]←2 4⍴'JohnPete'
∧
is incorrect. We must enclose the array like this:
mat[2;3] ← ⊂2 4⍴'JohnPete'
The result is what we expected:
]display mat
The result of enclose is always a scalar: cf. Section 10.1.2.4.
10.1.2.2. Disclose#
If we look at the contents of mat[2;3]
, we see a little 2 by 4 matrix, but if we look at its shape, we see that surprisingly it has no shape.
Its rank is zero, so it must be a scalar!
mat[2;3]
⍴mat[2;3]
As we can see, its shape is empty. And its rank is zero:
⍴⍴mat[2;3]
The explanation is obvious:
we have put this little matrix into a bag (a scalar), so we now see the bag, and not its contents.
If we want to see its contents, we must extract them from the bag, using a function called disclose, which is represented by the symbol ⊃
and typed with APL+x.
With it, we now have access to the matrix:
⍴⊃mat[2;3]
And its rank is two, as expected:
⍴⍴⊃mat[2;3]
We experience the same behaviour if we try to extract one item from a nested vector.
Let us recall the nested vector nesVec
:
nesVec
We can use similar expressions to the ones we used on mat
:
⍴nesVec[5]
The above looks like a scalar; it is a scalar, containing an enclosed vector.
Once we disclose it, we gain access to its contents (three elements, in this case):
⍴⊃nesVec[5]
In fact, this should not have come as a complete surprise to us. Earlier, we learned that the shape of the result of an indexing operation is identical to the shape of the indices. In this case (as well as in the matrix case above), the index specifies a scalar. Hence, it would be incorrect to expect anything other than a scalar as the result of the indexing operation!
10.1.2.3. Mnemonics#
It is easy to remember how to generate the two symbols for enclose and disclose on a US or UK keyboard:
Disclose
⊃
is generated by APL+X, as in eXtract; andEnclose
⊂
is generated by APL+Z, as in Zip-up.
For reference, the actual symbols are called left shoe and right shoe, respectively for ⊂
and ⊃
; “enclose” and “disclose” are the names of the functions.
10.1.2.4. Simple and Other Scalars#
We know that the result of enclose is always a scalar, but there is a difference between enclosing a scalar number or character, and enclosing any other array.
When appropriate, we shall use four different terms:
simple scalar refers to a single number or letter (rank zero);
enclosed array refers to a scalar that is the result of enclosing anything other than a simple scalar;
item refers to a scalar that is a constituent of an array, whether it is a simple scalar or an enclosed array; and
nested array is an array in which at least one of the items is an enclosed array.
Always remember these important points:
enclose does nothing to a simple scalar - it returns the scalar unchanged. The same for disclose;
all items of an array are effectively scalars, whether they are simple scalars or enclosed arrays: their rank is 0, and their shape is empty;
a single item can be replaced only by another single item: a simple scalar, or an array of values zipped up using enclose (to form an enclosed array); and
strand notation avoids the use of enclose, because of the conventions used to separate individual items from one another.
Let us create four vectors:
a ← 'coffee'
b ← 'tea'
c ← 'chocolate'
v ← a b c
The last statement is just a simpler way to write:
v ← (⊂a),(⊂b),⊂c
So, we can see that each of the items of v
is an enclosed character vector.
Thus,
⍴v[1]
is ⍬
, not 6
.
Here is another example:
nesVec[1 5 6] ← 'Yes' 987 'Hello'
]display nesVec
If we use any additional enclose primitives, the results are very different. And the results also vary depending on where the enclose primitives are used.
Here are two examples:
nesVec[1 5 6] ← 'Yes' 987 (⊂'Hello')
]display nesVec
nesVec[1 5 6] ← ⊂'Yes' 987 'Hello'
]display nesVec
Now, we revert back to the original values because we will need nesVec
below:
nesVec[1 5 6] ← 'Yes' 987 'Hello'
10.1.3. More about DISPLAY#
Most of the time, the user command ]box
displays enough information when working with nested arrays in the session.
However, in some situations, you might want or need more granular display information, which you can obtain by using the function DISPLAY
.
We have already seen the function DISPLAY
and its main characteristics in Section 3.6.4.
We now need to explore some additional characteristics of it.
10.1.3.1. Conventions#
The following conventions are used in the character matrix that DISPLAY
returns:
A simple scalar has no box around it.
All other arrays are shown with a surrounding box. The upper-left hand corner of the box describes the shape of the array. It can be:
─
, a simple line for a scalar that is an enclosed array;→
, a single arrow, for a vector;↓
or↓↓
, one or more vertical arrows for matrices and higher-rank arrays;⊖
, a horizontal circled minus for an array with empty last axis; or⌽
, a vertical circled bar for an array with another empty axis.
The bottom-left hand corner of the box describes the nature of the array:
─
, a simple line for character contents;~
, a tilde for numeric contents;+
, a plus symbol for mixed contents;∊
, a membership symbol for nested arrays;∇
, a del for⎕OR
arrays; or#
, a hash for namespace references.
We have not yet studied the last two concepts (⎕OR
and namespaces); you can ignore them for now.
10.1.3.2. Change the Default Presentation#
By default, the boxes are drawn with special line-drawing characters, but you can provide a zero left argument to force the function to use alternative (standard APL) characters:
)copy DISPLAY
DISPLAY 'Hello'
0 DISPLAY 'Hello'
As mentioned previously, the default presentation looks a lot better on the screen, but there may be situations where using standard APL characters may be preferred.
10.1.3.3. Distinguish Between Items#
Now that we have discovered the existence of scalars which are enclosed arrays, we can use DISPLAY
to distinguish between the two kinds of scalars.
Notice how DISPLAY
does not draw a box around the 34 below:
DISPLAY 34
DISPLAY nesVec[6]
The sixth item of nesVec
is an enclosed vector, so its corners are marked with a simple line and an ∊
.
It contains a second box whose corners tell us that 'Hello'
is a character vector.
nesVec[6]
is a scalar containing a vector.
If we disclose the item, we obtain a simple vector:
DISPLAY ⊃nesVec[6]
10.1.3.4. Empty Arrays#
Here is how DISPLAY
identifies some empty arrays:
Empty numeric vector:
DISPLAY ⍬
Empty text vector:
DISPLAY ''
These are vectors, because there is not vertical arrow, and the ⊖
sign indicates that they are empty.
At the bottom of the boxes, the symbols ~
and ─
show that an empty numeric vector and an empty character vector are different.
One contains a zero, the other contains a blank.
This indicates the type of the array, which is a property of an array even when the array is empty (in Section 10.9 we talk more about fill items).
We can see the same kind of output for empty matrices:
Empty numeric matrix:
DISPLAY 0 5⍴0
Empty character matrix:
DISPLAY 0 10⍴''
Another empty character matrix:
DISPLAY 5 0⍴''
Empty numeric 3D array:
DISPLAY 2 3 0⍴0
The output for the empty numeric 3D array contains 2 sets of 3 zeroes to show that its shape is 2 3 0
.
10.2. Choose Indexing#
Choose indexing is a different way of indexing arrays and is one example application of nested arrays.
In simple indexing, that you learned in Section 3.5, you can index into an array to select multiple items at the same time:
⎕← mat ← 3 4⍴⍳12
mat[1 2;3 4]
However, simple indexing always extracts items in a grid-like fashion.
For example, the indexing above specified that the result would come from rows 1
and 2
and from columns 3
and 4
.
But what if you wanted only the items mat[1;3]
, mat[2;3]
, and mat[2;4]
?
In that case, you can use choose indexing. In choose indexing, the index is a nested array and each scalar of that array identifies a single element of the array being indexed. Each scalar is a vector of indices, with one element per axis of the array being indexed.
So, if we want to index into a matrix, which has two axes, we need each scalar to be a vector of length two, enclosed. If we want three values, our index vector will have length three:
mat[(1 3)(2 3)(2 4)]
In choose indexing, the index does not have to be a vector. Much like with simple indexing, the shape of the index will determine the shape of the result:
mat[(1 1)(2 2)(3 3)(3 4)]
mat[2 2⍴(1 1)(2 2)(3 3)(3 4)]
mat[2 2 1⍴(1 1)(2 2)(3 3)(3 4)]
If you want to use choose indexing to index a single item, you need to enclose it to turn it into a scalar:
mat[⊂(1 2)]
10.3. Depth#
10.3.1. Enclosing Scalars#
Applied to a simple scalar, enclose does nothing: the enclose of a simple scalar is the same simple scalar:
]display 35
]display ⊂35
However, when applied to any other array, enclose puts a “bag” around it.
First, we start with a simple vector:
]display 2 4 8
If we use enclose once, we get a scalar containing a numeric vector:
]display ⊂2 4 8
With one more enclose, we get a scalar containing another scalar, itself containing a numeric vector.
10.3.2. The Depth of an Array#
Suppose that we write a function Process
, which takes as its argument a vector consisting of: the name of a town, the number of inhabitants, a country code, and the turnover of our company in that town.
For example, we could call the function as Process 'Lyon' 466400 'FR' 894600
.
For the purpose of this example, the function will just display the items it receives in its argument. We choose to write it with the following syntax:
]dinput
Process ← {
(town pop coun tov) ← ⍵
⎕← (15↑'Town = '),town
⎕← (15↑'Population = '),⍕pop
⎕← (15↑'Country = '),coun
⎕← (15↑'Turnover = '),⍕tov
}
Perhaps this is not the smartest thing we could do, but we did it!
Now, let us execute the function and verify that it works properly:
Process 'York' 186800 'GB' 540678
This looks promising, but what will happen if the user forgets one of the items that the function expects? Let us test it:
Process 'York' 186800 'GB'
LENGTH ERROR
Process[1] (town pop coun tov)←⍵
∧
As we might expect, an error message is issued: we cannot put 3 values into 4 variables!
Let us add a little test to our function to check whether or not the right argument has 4 items.
Here is the new version; notice the new line of code:
]dinput
Process ← {
4≠≢⍵: 'Hey, weren''t you supposed to provide 4 values?'
(town pop coun tov) ← ⍵
⎕← (15↑'Town = '),town
⎕← (15↑'Population = '),⍕pop
⎕← (15↑'Country = '),coun
⎕← (15↑'Turnover = '),⍕tov
}
It seems to work well now:
Process 'York' 186800 'GB'
But one day the user forgets all but one of the items, and just types the name of the town. If the user is (un)lucky enough to type a town name with four letters, here is what happens:
Process 'York'
This trivial example shows that when nested arrays are involved, it is not sufficient to rely on the shape of an array;
we need additional information: specifically, is it a simple or a nested array?
To help distinguish between simple and nested arrays, APL provides a function named depth.
It is represented by the monadic use of the symbol ≡
.
Here is a set of rules that define how to determine the depth of an array:
the depth of a simple scalar is 0;
the depth of any other array of any shape is 1, if all of its items are simple scalars.
We call such an array a simple array, so we can instead say:
the depth of a non-scalar, simple array is 1;
the depth of any other array is equal to the depth of its deepest item plus 1; and
the depth is positive if the array is uniform (all of its items have the same depth), and negative if it is not.
Therefore, our Process
function can only work when the argument ⍵
has depth ¯2
!
Why ¯2
?
Because the town name and the country name are character vectors, but the population and the turnover are numeric scalars, meaning that ⍵
has heterogeneous depth:
]dinput
Process ← {
¯2≠≡⍵: 'The argument has the wrong depth!'
4≠≢⍵: 'Hey, weren''t you supposed to provide 4 values?'
(town pop coun tov) ← ⍵
⎕← (15↑'Town = '),town
⎕← (15↑'Population = '),⍕pop
⎕← (15↑'Country = '),coun
⎕← (15↑'Turnover = '),⍕tov
}
Process 'York' 186800 'GB' 540678
Process 'York'
Another intuitive definition of depth is this: ]display
the array and count the number of boxes you must pass to reach its deepest item.
Here are some examples:
≡ 540678
As seen above, a scalar has depth 0.
The following vector contains only simple scalars. Its depth is 1:
≡ 15 84 37 11
The rank of an array doesn’t influence directly its depth. If we reshape the vector above into a matrix, its depth is still 1 because it contains only simple scalars:
≡ 2 2⍴15 84 37 11
Now, let us consider this nested vector:
≡ vec1 ← (4 3) 'Yes' (8 7 5 6) (2 4)
It is composed of four enclosed vectors, each of depth 1 - so vec1
has depth 2.
Now let us change the expression slightly:
≡ vec2 ← (4 3) 'Yes' (8 7 5) 6 (2 4)
This vector is no longer uniform: it contains four enclosed vectors and one simple scalar, so its depth is negative. The magnitude of the depth has not changed, since it reports the highest level of nesting.
In this context, the word “uniform” only means that the array contains items of the same depth.
vec2
is not uniform: it contains vectors (of depth 1) mixed with a scalar (of depth 0); andvec1
is uniform: all its items are vectors (of depth 1), even though they do not have the same shape, the same type, and certainly not the same content.
10.3.3. The Depth of an Array, Take 2#
We used the example of the function Process
to motivate the definition of the depth of an array, but perhaps we could have fixed our function in a different way.
This was the original definition of Process
:
]dinput
Process ← {
(town pop coun tov) ← ⍵
⎕← (15↑'Town = '),town
⎕← (15↑'Population = '),⍕pop
⎕← (15↑'Country = '),coun
⎕← (15↑'Turnover = '),⍕tov
}
Instead of checking the length of ⍵
to see if there are enough items, perhaps we could write a more lenient version of Process
that uses default values for the population, the country, and the turnover.
That way, if the user does not input enough arguments, the function still works, and displays that some information is missing:
]dinput
Process ← {
defaults ← ¯1 '?' ¯1
(town pop coun tov) ← ⍵,(¯1+≢⍵)↓defaults
⎕← (15↑'Town = '),town
⎕← (15↑'Population = '),⍕pop
⎕← (15↑'Country = '),coun
⎕← (15↑'Turnover = '),⍕tov
}
Process 'York' 186800 'GB' 540678
Process 'York' 186800 'GB'
It seems like we have a more robust function, but let us see what happens if we keep removing items from the arguments:
Process 'York' 186800
If we only pass in the town and the population, the function still works.
Now, let us try to pass in only the town:
Process 'York'
Once again, the user runs into trouble because our function Process
takes a look at the character vector 'York'
, sees it has four items, and thus adds no default values.
The issue can be resolved if the user remembers to enclose the town name:
Process ⊂'York'
However, as you will come to understand, it is rarely a good idea to rely on the user to pass the arguments in the correct format.
Wouldn’t it be nice if we, the developers, could take care of that for ourselves? As it turns out, we can.
10.4. Nest#
The function nest is a monadic function represented by the left shoe underbar character, ⊆
, which you can type with APL + Shift + Z.
(Remembering how to type ⊆
should not be too hard, because it lives in the same key as ⊂
.)
The function nest is sometimes called enclose if simple, because that is exactly what it does:
you give it an array, and ⊆
will enclose it if and only if the argument array is simple.
In an intuitive sense, but using less rigorous words, ⊆
will put a box around arrays that don’t have any boxes yet.
Let us take a look at a couple of examples. Here is a nested array:
'York' 186800 'GB' 540678
Because the array above is nested, it is not simple.
Therefore, ⊆
applied to that array will do nothing:
⊆'York' 186800 'GB' 540678
On the other hand, we can compare the simple character vector
'York'
with what we get if we nest it:
⊆'York'
Because 'York'
was not nested, ⊆
did it for us.
10.4.1. Argument Homogenisation#
In the context of the function Process
from before, the function nest becomes quite useful.
With it, we can handle the case when the user forgets to enclose the town name when no other information is given:
]dinput
Process ← {
defaults ← ¯1 '?' ¯1
(town pop coun tov) ← (⊆⍵),(¯1+≢⊆⍵)↓defaults
⎕← (15↑'Town = '),town
⎕← (15↑'Population = '),⍕pop
⎕← (15↑'Country = '),coun
⎕← (15↑'Turnover = '),⍕tov
}
Process 'York'
In the Section 10.15 you will be asked to use nest for the purpose of argument homogenisation again.
10.4.2. Nesting a Scalar#
A word of caution is in order, pertaining to what happens if we nest a simple scalar. The function nest is supposed to enclose its argument array when it is a simple array. So, let us try to nest the scalar 42:
⊆42
At a first glance, it looks like the function failed!
After all, there is no box around the 42…
But the function did not fail.
The “issue” here is that simple scalars match their own enclosures.
So, when ⊆
tried enclosing 42, nothing happened.
Bear this in mind when using nest, but do not worry about this giving you unpleasant surprises.
For example, pretend there is a town called 'A'
and let us call Process
with that town name:
Process 'A'
See?
⊆'A'
gives 'A'
, but that didn’t prevent the function from correctly handling the default values for the missing pieces of information.
10.5. Each#
10.5.1. Definition and Examples#
To avoid the necessity of processing the items of an array one after the other in an explicitly programmed loop, one can use a monadic operator called each, which is represented by a diaeresis symbol, which looks like ¨
and is typed with APL+Shift+1.
As its name implies, each applies the function on its left (its operand) to each of the items of the array on its right (if the function is monadic), or to each pair of corresponding items of the arrays on its left and right (if the function is dyadic).
Let us try it with some small nested vectors and a monadic function:
vec3 ← (5 2) (7 10 23) (52 41) (38 5 17 22)
vec4 ← (15 12) 71023 (2 2⍴⍳4) (74 85 96)
vec5 ← (7 5 1) (19 14 13) (33 44 55)
Now, we can ask for the shape of vec3
:
⍴vec3
Using ¨
, we can ask for the shape of each of the items of vec3
:
⍴¨vec3
We can do the same with the second vector:
⍴¨vec4
Beware! One item of vec4
is a scalar, so its shape is empty, as shown above.
If ]box
were off, this could look odd at first sight:
]box off
⍴¨vec4
]box on
If the function specified as the operand to each is dyadic, the derived function is also dyadic. As usual, if one of the arguments is a scalar, the scalar is automatically repeated to match the shape of the other argument. For example, take the following vector with the names of some months:
monVec ← 'January' 'February' 'March' 'April' 'May' 'June'
To take the first 3 letters of each vector in that vector of vectors, we would do
3↑¨monVec
As we have just shown, there is no need to repeat the 3
to have the same shape as monVec
.
Naturally, the operand to each can also be a user-defined function, provided that it can be applied to all of the items of the argument array(s):
Average ← {(+/⍵)÷≢⍵}
Average¨vec3
Remark
In fact, each is a bit more than a “hidden” loop.
Please, remember that all items of an array are scalars - either simple scalars or enclosed arrays.
So, in an expression like ⍴¨vec5
, shouldn’t we expect the result to be just a list of three empty vectors, since the shape of a scalar is an empty vector?
No, the each operator is smarter than that. For each item of the argument array, the item is first disclosed (the “bag” is opened), the function is applied to the disclosed item, and the result is enclosed again to form a scalar (i.e., put into a new “bag”). Finally, all the new bags (scalars) are arranged in exactly the same structure (rank and shape) as the original argument array to for the final result.
So,
⍴¨vec5
is in fact equivalent to
(⊂⍴⊃vec5[1]), (⊂⍴⊃vec5[2]), (⊂⍴⊃vec5[3])
(⍴¨vec5)≡(⊂⍴⊃vec5[1]), (⊂⍴⊃vec5[2]), (⊂⍴⊃vec5[3])
If the operand to each is a dyadic function, the corresponding items of the left and right arguments are both disclosed before applying the function.
We have seen that the operand to each may be a primitive function or a user-defined function.
It may also be a derived function returned by another operator.
For example, in the following expressions, the operand to each is not /
, but the derived function +/
.
In this example, we sum the numbers inside each item of the vector:
+/¨vec3
In this next one, it still works, even though one item is a matrix:
+/¨vec4
Beware: in some cases, the same derived function can be applied with or without the help of each, but the result will not be the same at all:
]display vec5
Without ¨
, +/
sums the three sub-vectors together:
+/vec5
With ¨
, +/¨
will compute the sum of each of the sub-vectors:
+/¨vec5
10.5.2. The Use of Each#
Each is a “loop cruncher”. Instead of programming loops, in APL you can apply any function to each of the items of an array, each of which may contain a complex set of data.
This operator is also useful combined with match when a simple equal sign would have caused an error. For example, to compare two lists of names:
'John' 'Julius' 'Jim' 'Jean' ≡¨ 'John' 'Oops' 'Jim' 'Jeff'
When used inappropriately, the each operator can sometimes use a large amount of memory for its intermediate results, so you may need to use it with some care.
Suppose that we have a huge list customerTover
, of turnover amounts, one item per customer (we have more than 5,000 of them!).
Each item contains a matrix having a varying number of rows (products) and 52 columns (weeks).
Our task is to calculate the total average turnover per week per customer.
No problem, that’s just (+/¨+⌿¨customerTover)÷52
.
However, if customerTover
is very large, and we do not have much workspace left, the above expression may easily cause a WS FULL
error.
The reason is that the intermediate expression +⌿¨customerTover
produces a list of 52 amounts per customer, and that may require more workspace than we have room for.
Instead, we can put the entire expression into a function.
As is often the case in APL (and in programming, in general), the hardest part of writing a function is finding a good name for it.
Fortunately, we can get by without a name if we use an anonymous dfn, with {(+/+⌿⍵)÷52}¨customerTover
.
Because we have “isolated” the entire logical process in the function and used each to loop through the items one by one, we will at most have only one customer’s data “active” at any time, and each intermediate result (a 52-item vector) will be thrown away before recalculating that for the next customer.
The result of each function call is just one number, so it is much less likely that we will run into WS FULL
problems.
10.5.3. Three Compressions!#
In the following we will show three expressions which look similar, but their results are very different.
Let us first recall that vec5
consists of three vectors, each containing three items:
vec5
What is the result of a compression?
1 0 1/vec5
Above, the vector 1 0 1
applies to the three items of vec5
, compressing out the middle one.
]display 1 0 1/vec5
As mentioned, the compression applies to the items of vec5
, as it would to any vector.
So, the second item has been removed.
If we use 1 0 1/¨vec5
, do you think the result is the same?
Are you sure?
It is not displayed the same way:
1 0 1/¨vec5
Things are different here: each item of 1 0 1
is paired with each sub-vector, like this:
1/7 5 1
gives7 5 1
;0/19 14 13
gives⍬
; and1/33 44 55
gives33 44 55
.
Thanks to ]display
:
]display 1 0 1/¨vec5
There is a third way of using compress.
If we enclose the left argument, the entire mask 1 0 1
is applied to each sub-vector.
The second item of each sub-vector has been removed:
]display (⊂1 0 1)/¨vec5
10.6. Processing Nested Arrays#
We have already seen a number of operations involving nested arrays; we shall explore some more in this section. Because nested arrays generally tend to have a rather simple - or at least uniform - structure, we can illustrate the operations using our little vectors.
10.6.1. Scalar Dyadic Functions#
You can refer to this section concerning the application of scalar dyadic functions to nested arrays.
However, let us here explore again how each applies to scalar dyadic functions:
vec5
vec5 + 100 20 1
100, 20, and 1 are added to the three sub-vectors, respectively.
Using each, the result is still the same:
vec5 +¨ 100 20 1
If we enclose the right argument, then 100 20 1
becomes a scalar, and gets added to each of the three sub-vectors:
vec5 +¨ ⊂100 20 1
If we drop the each operator, the result is the same because the scalar on the right is extended to match the shape of the left vector:
vec5 + ⊂100 20 1
In fact, each is a superfluous operator when used with scalar dyadic functions, because scalar dyadic functions are pervasive, as seen in a previous section.
10.6.2. Juxtaposition vs Catenation#
When you catenate a number of arrays, for example v ← a,b,c
, you create a new array with the contents of a
, b
, and c
catenated together to make a single new array, as we have seen many times before.
Let us use a small vector and see how it works:
small ← 3 4 5
1 2,small,6 7
As we can see, the result is a simple vector.
What happens here is, of course, that the first 3-item vector small
and the 2-item vector 6 7
are combined into one 5-item vector.
Then, this 5-item vector is combined with the 2-item vector 1 2
to form the resulting 7-item vector.
Both the final and the interim results are simple vectors.
We can now explain what happens when you juxtapose two or more arrays (strand notation), for example v ← a b c d e
: each array is enclosed, and the resulting scalars are catenated together.
Such an expression produces a vector made of as many items as we have arrays on the right. In the example that follows, the result is a nested vector:
1 2 small 6 7
This is what we call vector notation or strand notation. In this case, we juxtaposed five arrays, so we created a nested array of length five.
What happens here is that each of the five arrays is first enclosed, and then the resulting five scalars are catenated together to produce the 5-item vector.
Please remember that enclosing a simple scalar does not change it, so you can only see the difference for the array small
:
(1 2) small 6 7
Here, we juxtaposed four arrays, two of which are vectors. It is, again, an example of strand notation.
In other words, juxtaposition works on arrays seen as building blocks, while catenation works on the contents of the arrays.
It may help you to know that there is a strict relationship between catenation and strand notation:
a b c
is the same as (⊂a),(⊂b),(⊂c)
.
Here is an example:
a ← ⍬
b ← 'apl'
c ← 42
a b c
(⊂a),(⊂b),(⊂c)
The two results look the same; we can be sure they are the same by using ≡
:
a b c≡(⊂a),(⊂b),(⊂c)
Now, we will turn our attention to two other expressions that give the same result,
(1 2) small,6 7
and
(1 2) small 6 7
These two expressions give the same result, but for a different reason than the one explained above.
In fact, small
is not catenated to the vector 6 7
as in the first example above.
To read this expression correctly, we must recall comma is an APL function:
its right argument is the vector
6 7
, of course; andits left argument is whatever is on its left, up to the next function. As there is no such function (parenthesis are not functions), the left argument is the result of the entire expression to the left of the comma, i.e., the 2-item vector
(1 2) small
.
So, the result is that the 2-item vector (1 2) small
is combined with the 2-item vector 6 7
to form the resulting 4-item vector.
Remember this: when interpreting an expression, you must never “break” a sequence of juxtaposed arrays (a strand), even if it is a nested vector.
So, in the previous example, the left argument to catenate is this whole array:
(1 2) small
When catenate is executed, the two items of this argument are catenated to the two items 6 7
of the right argument, making the same 4-item nested vector as in the previous example.
Can you predict the result of (1 2),small 6 7
?
10.6.3. Characters and Numbers#
We have a character matrix cm
and a numeric matrix nm
:
⎕← cm ← 3 7⍴'FrancisCarmen Luciano'
⎕RL ← 73
⎕← nm ← (?3 4⍴200000)÷100
We would like to have them displayed side by side.
10.6.3.1. Solution 1#
The first idea is to just type cm nm
:
cm nm
The format of the result is not ideal; some values have two decimal digits, and some have only one or none. But there is a much more important problem. Imagine that we would like to draw a line on the top of the report. We can catenate a single dash along the first dimension:
'-'⍪cm nm
This is not what we expected: the dash has been placed on the left, not on the top!
The reason is that the expression cm nm
does not produce a matrix, but a 2-item nested vector.
And when one catenates a scalar to a vector, it is inserted before its first item or after the last one, to produce a longer vector.
This cannot produce a matrix, unless laminate is used, but we shall not try that now.
10.6.3.2. Solution 2#
Well, if juxtaposition doesn’t achieve what we want, why shouldn’t we catenate our two matrices?
cm,nm
This is almost the same presentation, but not exactly; this is a matrix!
Now, let us try to draw the line:
'-'⍪cm,nm
Horrible! What happened?
When we catenated cm
(shape 3 7
) with nm
(shape 3 4
), we produced a 3 by 11 matrix.
So, when we further catenated a dash on top of it, the dash was repeated 11 times to fit the last dimension of the matrix.
This is why we obtained 7 dashes on top of the 7 text columns, and 4 dashes, each on top of each of the 4 numeric columns.
This is still not what we want!
10.6.3.3. Solution 3#
The final solution will be the following: convert the numbers into text, using the format function, and then catenate one character matrix to another character matrix:
'-'⍪cm,9 2⍕nm
Now, the line is exactly where we want it and the numbers are nicely formatted.
Deduce the results of the following 3 expressions (depth, rank, shape), and then verify your solutions on the computer:
(⊂cm) (⊂nm)
(⊂cm),(⊂nm)
cm,⊂nm
10.6.4. Some More Operations#
Let us use vec5
once more.
10.6.4.1. Reduction#
+/vec5
Notice the box around the final result!
The three enclosed arrays (scalars) have been added together, and the result is therefore an enclosed array (a scalar). You can tell this from the output, because there is a box around the result.
We know that the reduction of a vector (rank 1) produces a scalar (rank 0), and this rule still applies here.
To obtain the contents of the (enclosed) vector, we must disclose the result:
⊃+/vec5
The same thing can be observed if we try to collect all the values contained in vec5
into a single vector, by catenating them together:
,/vec5
It worked, but here again we might want to disclose the result:
⊃,/vec5
10.6.4.2. Index Of and Membership#
The function index of (dyadic ⍳
) may be used to search for (find the position of) items in a nested vector:
vec5 ⍳ (19 14 13)(1 5 7)
This is correct: the first vector appears in vec5
as vec5[2]
, and the second vector is not present.
But beware, there is a booby trap:
vec5 ⍳ (19 14 13)
(19 14 13)
is not a nested array.
vec5
is searched for each of these three numbers individually, and they are not found.
To get the expected result, we need to enclose the right argument to index of:
vec5 ⍳ ⊂19 14 13
It is also important to be aware of this when using membership:
(3 4 5)(7 5 1) ∊ vec5
(7 5 1) ∊ vec5
(⊂7 5 1) ∊ vec5
10.6.4.3. Indexing#
The rules we saw about indexing remain true: when one indexes a vector by an array, the result has the same shape as the array. If the vector is nested, the result is generally nested too:
]display vec4
]display vec4[2 2⍴4 2 1 3]
We have also seen, in Section 3.5.3, that a nested array can be used as an index. For example, to index items scattered throughout a matrix, the array that specifies the indices is composed of 2-item vectors (row and column indices):
⎕← tests ← 6 3⍴11 26 22 14 87 52 30 28 19 65 40 55 19 31 64 33 70 44
tests[(2 3)(5 1)(1 2)]
tests[2 2⍴(2 3)(5 1)(1 2)]
Let us try to obtain the same result with the index function, or squad:
(2 3)(5 1)(1 2) ⌷ tests
LENGTH ERROR
(2 3)(5 1)(1 2)⌷tests
∧
The above cannot work. Index expects a 2-item vector: a list of rows and a list of columns.
(2 3)(5 1)(1 2) ⌷¨ tests
RANK ERROR
(2 3)(5 1)(1 2)⌷¨tests
∧
This second attempt also won’t work: each item of the left argument cannot be associated with a corresponding item of tests
, because they do not have the same shape.
In order to get this to work, we need to enclose tests
:
(2 3)(5 1)(1 2) ⌷¨ ⊂tests
This last expression worked correctly. Each couple of indices is paired with tests
as a whole because it has been enclosed, and therefore the scalar on the right is extended to match the 3-item vector on the left.
Always keep in mind the following rules:
The items of a nested array are scalars and are therefore always processed as scalars.
In the expression below,
(5 6)(4 2)×10 5
(5 6)
is multiplied by 10
and (4 2)
is multiplied by 5
.
A single list of values placed between parentheses is not a nested array:
(45 77 80)
The parentheses do nothing here.
An expression is always evaluated from right to left, one function at a time. Note that strands can be easy to miss when determining what the left argument of a function is.
In the expression 2×a 3+b
, the left argument of the plus function is not 3
alone, but the vector a 3
.
Before we go any further with nested arrays, we recommend that you try to solve some exercises.
10.7. Intermission Exercises#
You are given three numeric vectors:
a ← 1 2 3
b ← 4 5 6
c ← 7 8 9
Try to predict the results given by the following expressions in terms of depth, rank, and shape.
Then check your results using ]display
, or the appropriate primitives.
a b c × 1 2 3
(10 20),a
(10 20),a b
a b 2 × c[2]
10×a 20×b
Same question for the following expressions:
+/a b c
+/¨a b c
1 0 1/¨a b c
(a b c)⍳(4 5 6)
1 10 3 ∊ a
(⊂1 0 1)/¨a b c
1 10 3 ∊ a b c
What are the results of +/na
and ,/na
for the vector na
shown below?
⎕← na ← 1 2 (2 2⍴3 4 5 6)7 8
10.8. Split and Mix#
We saw that in some cases we can choose to represent data either as a matrix or as a nested vector; remember monMat
and monVec
.
Two primitive monadic functions are provided to switch from one form to the other:
Mix (
↑
) returns an array of higher rank and lower depth than that of its argument; andSplit (
↓
) returns an array of lower rank and higher depth than that of its argument.
10.8.1. Basic Use#
Let us apply mix to two small vectors:
vtex ← 'One' 'Two' 'Three'
vnum ← (6 2) 14 (7 5 3)
⎕← rtex ← ↑ vtex
Notice how we have converted a nested vector (of depth 2 and rank 1) into a simple matrix (of depth 1 and rank 2).
⎕← rnum ← ↑ vnum
In this example, we have converted a nested vector (of depth -2 and rank 1) into a simple matrix (of depth 1 and rank 2).
Of course the operation is possible only because the shorter items are padded with blanks (for text) or zeroes (for numbers), or more generally by the appropriate fill item (this notion will be explained soon).
The last example above shows that when we say that the depth is reduced, we actually mean that the magnitude of the depth is reduced.
And now, let us apply split to the matrices we have just produced:
⎕← newtex ← ↓rtex
We converted a simple matrix (of depth 1 and rank 2) into a nested vector (of depth 2 and rank 1).
⎕← newnum ← ↓rnum
Note that the two new vectors (newtex
and newnum
) are not identical to the original ones (vtex
and vnum
) because, when they were converted into the matrices rtex
and rnum
, the shorter items were padded.
When one splits a matrix, the items of the result all have the same size.
10.8.1.1. Mix Applied to Heterogeneous Data#
The examples shown above represent very common uses of mix and split. However, it is of course also possible to apply the functions to heterogeneous data.
For example, we can mix text and numbers:
↑'Mixed' (11 43)
And we can also mix a simple vector with a nested one. As expected, the result below is a 2 by 3 matrix:
↑ 'Yes' ('Oui' 'Da' 'Si')
10.8.2. Axis Specification#
10.8.2.1. Split#
When we apply the function split to an array, its rank will decrease, so we must specify which of its dimensions is to be suppressed. If we don’t specify it explicitly, the default is to suppress the last dimension.
Let us work on chemistry
, a matrix we used earlier:
⎕← chemistry ← 3 5⍴'H2SO4CaCO3Fe2O3'
In this case, there are two possible uses of split: we can apply it either to the first dimension or to the second dimension.
If we specify the first axis, the matrix is split column-wise:
↓[1]chemistry
If we specify the second axis, the matrix is split row-wise:
↓[2]chemistry
If we omit the axis specification, split defaults to the last axis:
↓chemistry
10.8.2.2. Mix#
The use of mix is a bit more complex because it adds a new dimension to an existing array. So does the function laminate, and the two functions use the same convention to specify where to insert the new dimension.
If we apply the function mix to a 3-item nested vector of vectors, in which the largest item is an enclosed 5-item vector, the result must be either a 5 by 3 matrix, or a 3 by 5 matrix (the default).
In the same way as for laminate, a new dimension is created. This new dimension can be inserted before or after the existing dimension. The programmer decides this by specifying an axis:
[0.5]
inserts the new dimension before the existing one, resulting in a 5 by 3 matrix; or[1.5]
inserts the new dimension after the existing one, resulting in a 3 by 5 matrix.
↑[0.5]'One' 'Two' 'Three'
↑[1.5]'One' 'Two' 'Three'
The last example is the default behaviour, where the new dimension is inserted after the existing one:
↑'One' 'Two' 'Three'
Let us now work with a nested matrix:
⎕← friends ← 2 3⍴'John' 'Mike' 'Anna' 'Noah' 'Suzy' 'Paul'
The shape of this matrix is 2 3
, and its items are all of length 4
.
So, mix can produce three different results, according to axis specifications as follows:
With the axis |
the new dimension is inserted |
and the resulting shape is |
---|---|---|
|
after |
|
|
between |
|
|
before |
|
Each of these three cases is illustrated below.
↑[2.5]friends ⍝ Default case, [2.5] was unnecessary.
⍴↑[2.5]friends
↑[1.5]friends
⍴↑[1.5]friends
↑[0.5]friends
⍴↑[0.5]friends
In the first example, the names are placed “horizontally” as rows in two sub-matrices.
In the second case, they are placed “vertically” in columns.
The third case is more difficult to read; the names are positioned perpendicularly to the matrices, with one letter in each. You might like to imagine that the letters are arranged in a cube, and that you are viewing it from three different positions.
Notice that, naturally, there is a connection between using ↑[k]
and using mix followed by dyadic transpose.
The tables above have shown that the main difference between using the default mix, or using mix with axis, pertains to the place where the new axis gets inserted into the shape of the result. Therefore, one can always use dyadic transpose after mix to shuffle the axis of the result to the intended position.
Let us revisit the examples above using friends
.
↑[0.5]friends
will have a resulting shape of 4 2 3
, while ↑friends
has a shape of 2 3 4
.
Therefore, dyadic transpose needs to move the last axis of ↑friends
to the front:
2 3 1⍉↑friends
(↑[0.5]friends)≡2 3 1⍉↑friends
Recall that the left argument of dyadic transpose tells you the position to which each axis goes.
If la
is the left argument of dyadic transpose, la ← 2 3 1
, then la[1]
tells us where the 1st axis goes, la[2]
tells us where the 2nd axis goes, and la[3]
tells us where the 3rd (and last) axis goes.
Because la[3]
is 1
, we know that the last axis (which was created by mix) will now become the first axis, and the axes that were in positions 1
and 2
will move one position down, to 2
and 3
.
Similarly, we can determine what should be the left argument to dyadic transpose if we were to use it instead of doing ↑[1.5]friends
.
With ↑[1.5]
, we want the new axis to go in the middle.
If we work from ↑friends
, the last axis in ↑friends
needs to go to position 2
, so we have la ← ? ? 2
.
We just have to fill in the rest of the left argument, making sure that the original axes remain ordered:
la ← 1 3 2
la⍉↑friends
(↑[1.5]friends)≡la⍉↑friends
10.9. Type, Prototype, Fill Item#
Some operations like expand or take may insert new additional items into an array. Up to now, things were simple; numeric arrays were expanded with zeroes and character arrays were expanded with blanks. But what will happen if the array contains both numbers and characters (a mixed array), or if it is a nested array?
We need a variable to experiment a little:
⎕← hogwash ← 19 (2 2⍴⍳4) (3 1⍴'APL') (2 2⍴5 8 'Nuts' 9)
What would be the result of expressions like 6↑hogwash
or 1 1 0 1 0 1\hogwash
?
In general, when expanding an array, APL inserts fill items, and it does so using the prototype of the array.
In order to understand what the prototype of hogwash
is, we first need to understand what the type of an array is.
Definition
The type of an array is an array with the exact same structure (shape, rank, and depth, for all levels of nesting) in which all numbers are replaced by zeroes and all characters are replaced by blanks.
For example, here is the type of hogwash
:
⎕← hogwashType ← 0 (2 2⍴0) (3 1⍴' ') (2 2⍴0 0 ' ' 0)
As we can (not) see, the type of a nested array may be difficult to interpret because of the invisible blanks:
]display hogwashType
Having defined what the type of an array is, we can define what the prototype of an array is:
Definition
In other words, the prototype of an array is its first item, in which all numbers are replaced by zeroes and all characters are replaced by blanks.
The prototype of an array is used as a fill item whenever an operation needs to create additional items.
The first item of hogwash
is a number, so the prototype of hogwash
is a single zero.
If we lengthen the vector using overtake, it will be padded with zeroes (fill items):
6↑hogwash
Similarly, if we expand the array, the new items will also be zeroes:
1 1 0 1 0 1\hogwash
Let us rotate the vector by one position:
hogwash ← 1⌽hogwash
Now, the first item is a numeric matrix:
⊃hogwash
Therefore, the prototype of hogwash
is now
2 2⍴0
If we take six items from hogwash
, two such matrices will be added:
6↑hogwash
Let us rotate the variable once more:
hogwash ← 1⌽hogwash
Now, the first item is a little 3 by 1 character matrix containing the letters 'APL'
.
So, the prototype will be a 3 by 1 character matrix containing three blank spaces.
This is the array that will be used by expand as the fill item.
Let us verify it:
]display 1 1 0 1 0 1\hogwash
If we repeat the rotation, the first item will be a nested matrix. So, the prototype (and hence, also the fill item) will be a 2 by 2 nested matrix. Let us try to overtake again:
hogwash ← 1⌽hogwash
]display 6↑hogwash
Obviously, fill items are generally only useful for arrays whose items have a uniform structure.
We will talk a bit about computing the type and prototype of arrays in Section 10.16.2.
10.10. Pick#
10.10.1. Definition#
Whenever you need to select one (and only one) item from an array, you can use the dyadic function pick, represented by the symbol ⊃
.
What makes pick different from ordinary indexing is that it is possible to “dig into” a nested array and pick an item at any level of nesting, and that it discloses the result.
The latter is probably the reason why pick and the monadic function disclose use the same symbol.
The syntax of pick is as follows: r ← path ⊃ data
.
The left argument is a scalar or a vector which specifies the path that leads to the desired item.
Each item of path
is the index or set of indices needed to reach the item at the corresponding level of depth of the array.
The operation starts at the outermost level and goes deeper and deeper into the levels of nesting. At each level, the selected item is disclosed before applying the next level of selection.
We shall work with the nested matrix weird
from a previous section:
⎕← weird ← 2 2⍴456 (2 2⍴ 'Dyalog' 44 27 (2 2⍴8 6 2 4)) (17 51) 'Twisted'
Let us try to select the value 51
.
To select the 51
we must first select the vector located in row 2, column 1 of the matrix, and then select the second item of that vector.
This is how we express this selection using pick:
(2 1) 2 ⊃ weird
The left argument (2 1) 2
is a 2-item vector because we need to select at two levels of nesting.
Using simple indexing and explicit disclosing we need a much more complicated expression to obtain the same selection:
⊃(⊃weird[2;1])[2]
Although, to be fair, in this special case the leftmost ⊃
was not required.
(Can you figure out why?)
We can also select the letter “g” within “Dyalog”. To do so, we must first select the matrix located in row 1, column 2. Within this matrix, we must select the character vector located in row 1, column 1. Finally, we must select the 6th item of that character vector:
(1 2) (1 1) 6 ⊃ weird
This time, the left argument is a 3-item vector because we need to select at three levels of nesting:
(1 2)
is the set of indices for the selection at the outermost level of depth;(1 1)
is the set of indices for the selection at the second level of depth; and6
is the index for the selection at the third level of depth.
Using simple indexing, this selection is almost obscure:
⊃(⊃(⊃weird[1;2])[1;1])[6]
10.10.2. Left Argument Length#
The left argument to pick is a vector with as many items as the depth at which we want to select an item. Each item of the left argument has a number of items corresponding to the rank of the sub-item at the corresponding depth at which it operates.
If we remove the last item of path
in the example above, the selection will stop one level above the level at which it stopped before.
This means that we would select the entire character vector 'Dyalog'
instead of just the letter 'g'
:
(1 2) (1 1) ⊃ weird
Yes, we selected the entire character vector. Please, note again that the result has been disclosed, so that a simple array is returned in this case, instead of a scalar which is an enclosed vector.
The difference becomes more clear if we compare this with the equivalent simple indexing without the final disclose:
(⊃weird[1;2])[1;1]
We tried removing the last item of path
, but what happens if we instead remove the last two items of path
?
If we remove the last two items of path
, we might expect to select the entire 2 by 2 nested matrix that contains the character vector 'Dyalog'
:
(1 2) ⊃ weird
RANK ERROR
(1 2)⊃weird
∧
But it does not work!
The reason for this is a problem that we have seen before:
In the expression (1 2) (1 1) ⊃ weird
, the item (1 2)
is a scalar (an enclosed vector) because of strand notation.
The left argument to pick has two items, because we want to select an item at the second level.
In the expression (1 2) ⊃ weird
, we do not have a strand, so the argument (1 2)
is not enclosed.
It is a (simple) 2-item vector and, therefore, only suitable for selection at the second level.
The RANK ERROR
is reported because we try to use a scalar 1
as an index at the outermost level.
However, at this level the array is a matrix, so two items are needed to form a proper index.
We want to select at the outermost level, so the left argument to pick must have exactly one item. Therefore, we must explicitly enclose the vector, leading to the correct expression:
(⊂1 2) ⊃ weird
We still need two indices inside the enclosure because, at the outermost level, the array is a matrix.
The expression we used before (without the explicit enclose) is inappropriate for the array weird
, but it could work fine with a different array;
for example, to take the first item of a nested vector, and then select the second item of it, as shown here:
1 2⊃'Madrid' 'New York' 'London'
The 1
selects 'Madrid'
, and the 2
then selects the 'a'
.
In this expression, an enclose would be wrong, as we need to select at two levels. However, at each level we only need one index, as we select from vectors at both levels.
10.10.3. Disclosed Result#
As mentioned previously, pick returns the contents of the specified item, not the scalar which contains it.
Let us refer to the original value of hogwash
(i.e., before we rotated it before):
hogwash ← 19 (2 2⍴⍳4) (3 1⍴'APL') (2 2⍴5 8 'Nuts' 9)
Because boxing is ON, we can readily tell the difference between
2⊃hogwash
and
hogwash[2]
However, if boxing is OFF, we might make the mistake of believing that the two results are equal:
]box off
2⊃hogwash
hogwash[2]
Because boxing is OFF, the two results look very similar.
(An attentive reader will notice that the result of hogwash[2]
is indented one space to the right, which indicates one level of nesting.)
This is deceptive:
the first expression (
2⊃hogwash
) returns the 2 by 2 matrix contained inhogwash
:
⍴2⊃hogwash
while the other expression merely returns the second item of
hogwash
, which is an enclosed matrix:
⍴hogwash[2]
To prevent us from shooting ourselves in the foot, let us turn boxing back ON:
]box on
10.10.4. Pick First#
We have not mentioned this before (because up to now we have only used it on 1-item arrays), but disclose ⊃
actually discloses just the first item of an array.
All other items are ignored.
In other words, disclose ⊃array
is the same as 1⊃,array
.
For this reason, the function ⊃
is also called first:
⊃26 (10 20 30) 100
⊃'January' 'February' 'March'
⊃2 2⍴'Dyalog' (2 2⍴⍳4) 'APL' 100
⊃12
10.10.5. Selective Assignment#
When one wants to modify an item deep inside an array, it is important to remember that pick returns a disclosed result.
For example, let us try to replace the number 5
with the character vector 'five'
in the fourth item of hogwash
.
If we wanted to extract the value 5
, we would just write
4 (1 1)⊃hogwash
To replace it, we use the same expression in a normal selective assignment:
(4 (1 1)⊃hogwash) ← 'Five'
hogwash
And it works, though we haven’t enclosed the replacement value! Going back is just as easy:
(4 (1 1)⊃hogwash) ← 5
hogwash
10.10.6. An Idiom#
Suppose you have a nested vector:
nv ← (3 7 5)(9 7 2 8)(1 6)(2 0 8)
You can select one of its items with:
2⊃nv
But how can you select two (or more) items? For example, the 2nd and the 4th items?
2 4⊃nv
This does not work; it selects only one item: the 4th item of the 2nd item, which is the number 8
in this case.
Maybe we can use each ⊃¨
to pick each of the items we want?
2 4⊃¨nv
LENGTH ERROR
2 4⊃¨nv
∧
This gives a LENGTH ERROR
because ¨
is trying to pair each of the two numbers on the left with an item on the right, but nv
has a total of four items.
In order to fix this, we need to enclose nv
so that ¨
knows to pair each number on the left with the whole vector nv
:
2 4⊃¨⊂nv
This expression is known as the “chipmunk idiom”, probably because of the eyes and moustaches of the combined symbol: ⊃¨⊂
.
10.11. Reach Indexing#
10.11.1. Relationship to Pick#
The way in which you can use pick to access elements from a nested array is very similar to another indexing notation that is called reach indexing. Unlike simple indexing and choose indexing, which only let you access the scalars of an array, reach indexing can be used to index into arbitrary levels of depth of nested arrays. Hence, its name.
In reach indexing, the index specification is given by a non-simple integer array, each of whose items reach down to a nested element of the array being indexed. As we will see, each of those items works in the same way as the left argument of pick.
Recall the nested array weird
:
weird
We learned that, to access the nested character vector 'Twisted'
, we could pick it with the left argument (⊂2 2)
.
Similarly, we can access the integer 44
by picking it with the left argument (1 2) (1 2)
:
(⊂2 2)⊃weird
(1 2)(1 2)⊃weird
To pick both in a single expression, we need to use the idiom we just learned:
(⊂2 2)((1 2)(1 2)) ⊃¨⊂ weird
By using reach indexing, we just need to take the left argument of ⊃¨⊂
and put it inside square brackets:
weird[(⊂2 2)((1 2)(1 2))]
One key difference between reach indexing and using pick (or the idiom) is that pick will disclose the result:
(⊂2 2)⊃weird
Whereas reach indexing doesn’t:
weird[⊂(2 2)]
10.11.2. Reach Versus Choose Indexing#
In some situations, indices for reach indexing can look like indices for choose indexing.
For example, to pick the character vector 'DYALOG'
from the array weird
, we do
(1 2)(1 1)⊃weird
Thus, one might think that weird[(1 2)(1 1)]
uses reach indexing to fetch that same character vector (but enclosed).
Alas, this doesn’t work.
Or at least, not in the intended way:
weird[(1 2)(1 1)]
The result we got not what we expected because (1 2)(1 1)
was interpreted as an index vector with two scalars,
whereas we wanted it to refer to a single element of the array weird
.
To fix this, we have to enclose that vector to make it a scalar:
weird[⊂(1 2)(1 1)]
In reach indexing, each scalar of the index array reaches to a single item, so (1 2)(1 1)
can be seen as an index vector for choose indexing (for the scalars weird[1;2]
and weird[1;1]
) or an index vector for reach indexing that accesses the same values.
Thus, we can see that reach indexing and choose indexing overlap, but when they do, both schemes interpret the indices in the same way.
10.12. Partitioned Enclose & Partition#
10.12.1. Partitioned Enclose#
The primitive function partitioned enclose is the dyadic use of the left shoe ⊂
.
It is used to group the items of an array into a vector of nested items, or enclosures, according to a specified pattern.
It is used as r ← pattern ⊂ array
, or optionally with an axis specification: r ← pattern ⊂[axis] array
.
Partitioned enclose breaks up the right argument array
into nested items, as determined by the left argument pattern
.
10.12.1.1. Simple Boolean Vector Left Argument#
Let us start by understanding how partitioned enclose works when the left argument pattern
is a simple Boolean vector:
1 0 0 1 0 0 0 0 0 ⊂ 'Partition'
1 0 0 1 0 0 1 0 0 ⊂ 'Partition'
The two examples seem to show that the 1
s in the left argument specify where new enclosures of the right argument start.
The 0
s just put the corresponding elements in the preceding enclosure.
Notice that, as soon as we start the last enclosure (with the last 1
), the trailing 0
s are irrelevant.
Thus, we can safely omit them from the left argument:
1 0 0 1 0 0 1 ⊂ 'Partition'
Again, we can omit trailing zeroes, but we do not have to. In fact, in older versions of Dyalog APL, partitioned enclose expects the trailing zeroes to be present. In other words, the ability to not specify trailing zeroes was an extension to partitioned enclose that was introduced after partitioned enclose had been in the language.
We have seen what we can do about trailing zeroes. It is also important to understand what happens when the left argument has leading zeroes:
0 0 1 0 0 1 0 0 1 ⊂ 'Partition'
Leading zeroes have not been preceded by any enclosures, so the corresponding items have nowhere to go. Because of that, they are omitted from the final result.
We have already covered most of the behaviour of partitioned enclose, we are only missing some details.
10.12.1.2. Multiple Enclosures#
The left argument pattern
can be a simple integer vector with arbitrary non-negative integers, it doesn’t have to contain only zeroes and ones.
If we interpret the role of the zeroes and ones in a slightly different way, we can immediately understand how larger integers will work.
For that, we can use less rigorous language, and say that the enclosures of the result start in the places where we inserted dividers to split the right argument. Having said that, we just have to understand how those dividers are placed:
a
0
in the left argument means that we will insert0
dividers before the corresponding item of the right argument; anda
1
in the left argument means that we will insert1
divider before the corresponding item of the right argument.
Thus, an integer n
in the left argument means that we will insert n
dividers before the corresponding item of the right argument:
3 0 0 1 0 0 2 0 0 ⊂ 'Partition'
Above, pattern
started with a 3
and array
started with 'P'
. Thus, partitioned enclose must insert 3
dividers before the 'P'
.
Because more than one divider was inserted, only the last one gets the corresponding item from the argument array
.
Using mix as visual aid, we can see clearly where the dividers will be inserted:
↑(3 0 0 1 0 0 2 0 0) 'Partition'
The usage of mix shows that we insert 3
dividers before the initial 'P'
, 1
divider before the first 't'
, and 2
dividers before the last 'i'
.
10.12.1.3. Trailing Empty Enclosures#
When the left argument pattern
starts with an integer that is greater than one, the final result will have some leading empty enclosures.
If we want to get a result with trailing empty enclosures, we just need to make sure that the length of pattern
is one greater than the length of the right argument:
2 0 0 1 0 0 1 0 0 1 ⊂ 'Partition'
We can use mix again, and we will understand how the trailing 1
creates an empty enclosure by inserting a divider right after the last item of the right argument:
↑(2 0 0 1 0 0 1 0 0 1) 'Partition'
10.12.1.4. Scalar Left Argument#
So far, we have only seen how partitioned enclose works with a vector left argument. Now, we will see what happens if the left argument is a scalar.
First, take a look at this example:
1 0 0 0 0 0 0 0 0 ⊂ 'Partition'
We know we can omit trailing zeroes, so we might be tempted to rewrite the example above as:
1 ⊂ 'Partition'
However, when we do so, we get an unexpected result! That’s because the left argument is a scalar, and we can only omit trailing zeroes from vectors.
When the left argument is a scalar s
, it gets extended to (≢array)⍴s
.
Therefore, the example above is equivalent to
(9⍴1) ⊂ 'Partition'
10.12.1.5. Partitioned Enclose with Axis#
When we first introduced partitioned enclose, we mentioned that it can also accept an axis specification, as such: pattern ⊂[axis] array
.
Obviously, when array
is a vector, axis
is irrelevant because we can only have axis ← 1
.
For the axis specification to be relevant, array
needs to be of rank two or higher.
First, we want to know what is the default value for axis
, and we can find that out with a quick test:
1 ⊂ 2 2⍴⍳4
When applied to a matrix with no axis specification, partitioned enclose created enclosures around the columns of the matrix, which shows that the default axis is ≢⍴axis
, i.e., the last axis.
If we want to create enclosures around the rows, we can specify axis ← 1
:
1 ⊂[1] 2 2⍴⍳4
Notice that ⊂
returns a vector, while perhaps you expected the result to look like this:
⍪ 1 ⊂[1] 2 2⍴⍳4
This is how partitioned enclose works: it always returns a vector with the enclosures as items.
Here is another example, where we use a 3D array as the right argument:
⎕← cuboid ← 3 4 5⍴⎕A
By using partitioned enclose along the first axis, we can get a vector with enclosures around the planes that compose cuboid
:
1 0 1 ⊂[1] cuboid
The things we learned about the behaviour of the left argument of partitioned enclose still apply when we have a higher-dimensional right argument and/or an axis specification; we just need to interpret the left argument from the point of view of the correct axis:
1 0 1 ⊂[2] cuboid
In this example, the left argument is 1 0 1
and the axis specified is the second one, which has length
2⊃⍴cuboid
So, if the left argument is 1 0 1
and the axis in question has length four, we are omitting a trailing zero:
1 0 1 0 ⊂[2] cuboid
If you find it hard to visualise why the result is as shown, you can try to reason about partitioned enclose with an axis specification as a series of enclosures around indexing operations.
First, we can put the left argument up with the valid indices for the axis in question:
↑(1 0 1 0)(⍳2⊃⍴cuboid)
This shows that we will have an enclosure around indices 1 2
and another one around indices 3 4
.
Now, we just have to do the indexing along the correct axis.
Because cuboid
is a 3D array and we are working with the second index, the indexing will look like cuboid[;??;]
:
(⊂cuboid[;1 2;]),(⊂cuboid[;3 4;])
(1 0 1 0⊂[2]cuboid) ≡ (⊂cuboid[;1 2;]),(⊂cuboid[;3 4;])
10.12.1.6. Wrap-up#
Now that we have seen the various nuances associated with partitioned enclose, we can bundle them up together.
In the expression r ← pattern ⊂[axis] array
, we have that:
array
may be any array;pattern
may be a non-negative integer scalar or a simple numeric vector composed of non-negative integers;if left unspecified,
axis
defaults to≢⍴array
, i.e., the last axis ofarray
;if
pattern
is a scalars
, it is extended to(axis⊃⍴array)⍴s
;if
pattern
is a vector, its maximum length is1+axis⊃⍴array
and if thepattern
length is not the maximum, it is extended with trailing zeroes;each non-zero element in
pattern
specifies how many dividers to insert before the corresponding position along the appropriate axis ofarray
;each enclosure has rank
≢⍴array
and shape⍴array
, except in the position specified byaxis
;the result
r
is a vector containing all the enclosures specified by thepattern
; andthe length of
r
is+⌿pattern
(after extensions).
10.12.2. Partition#
The partition function is the dyadic usage of ⊆
, and is somewhat similar to the partitioned enclose function.
In r ← pattern ⊆ array
, pattern
must be a simple vector of non-negative integers, with the same length as the specified axis of the array to be partitioned.
It operates as follows:
the first enclosure starts with the first item of the array;
each enclosure ends when the next value of
pattern
is greater than the current one; andthe items which correspond to zeroes in
pattern
are removed.
10.12.2.1. Working on Vectors#
We shall work with characters, but of course we could have worked with numbers just as well:
pattern ← 3 3 3 7 7 1 1 0 3 3 3 9 2 1 1 0
pattern ⊆ 'Once upon a time'
The four enclosures correspond to the beginning of the array, plus the three increments: 3 → 7
, 0 → 3
, and 3 → 9
.
You will also notice that two characters have disappeared, because they corresponded to zeroes in the pattern.
This definition can be used to group the items of a vector according to a given vector of keys, provided that the keys are ordered in ascending order. For example:
area ← 22 22 41 41 41 41 57 63 63 63 85 85
cash ← 17 10 21 45 75 41 30 81 20 11 42 53
area ⊆ cash
This definition is also extremely convenient to divide a character string into a vector of strings on the basis of a separator. For example, let us partition a vector at each of its blank characters:
phrase ← 'Panama is a canal between Atlantic and Pacific'
↑phrase(phrase≠' ')
(phrase≠' ')⊆phrase
The blanks have been removed, because they matched the zeroes, and a new enclosure starts at the beginning of each word, corresponding to the increment 0 → 1
.
As you might imagine, this is extremely useful in many circumstances.
One can write a function to do it, with the separator passed as a left argument:
Cut ← {(~⍵∊⍺)⊆⍵}
↑' 'Cut phrase
In fact, we wrote the function to accept not just a single separator, but a list of separators, by replacing the perhaps more obvious (⍵≠⍺)
by (~⍵∊⍺)
.
Now we can use it like this:
↑'mw' Cut phrase
10.12.2.2. Working on Higher-Rank Arrays#
Although partition is very simple, and clearly useful, when applied to vectors, the situation is more complex when it is applied to matrices or higher-rank arrays. This is in contrast to the definition of partitioned enclose, which works on any rank arrays in a very straightforward way. We shall not study the more complex application of partition here; if you are interested, please refer to Section 10.16.3 at the end of this chapter.
10.13. Union & Intersection#
In mathematics, one uses the two functions union and intersection to compare two sets of values. Dyalog APL provides the same functions, with the same symbols as the ones used in mathematics:
union,
left ∪ right
(typed with APL + v), returns a vector containing all the items ofleft
, followed by the items ofright
which do not appear inleft
. Bothleft
andright
must be scalars or vectors. Equivalent toleft,right~left
.intersection,
left ∩ right
(typed with APL + c), returns a vector containing the items ofleft
that also appear inright
. Bothleft
andright
must be scalars or vectors. Equivalent to(left∊right)/left
.
15 76 43 80 ∪ 11 43 15 20 76 93
'we' 'are' 'so' 'happy' ∩ 'why' 'are' 'you' 'so' 'tired?'
Note that these functions do not remove duplicates (because, in mathematics, all the items of a set are supposedly distinct):
1 1 2 2 ∪ 1 1 3 3 5 5
'if' 'we' 'had' 'had' 'a' 'car' ∩ 'have' 'you' 'had' 'lunch' '?'
10.14. Enlist#
Enlist is the monadic usage of epsilon ∊
.
Enlist returns a vector of all the simple scalars contained in an array.
This could, at first sight, look very much like ravel, but it is not the same for nested arrays.
Ravel just rearranges the top-level items of an array, while enlist removes all levels of nesting and returns a simple vector.
Let us compare the two functions:
⎕← test ← 2 2⍴'One' 'Two' 'Three' 'Four'
,test
∊test
10.15. Exercises#
You are given two vectors. The first contains the reference codes for some items in a warehouse. Identical codes are grouped, but not necessarily in ascending order. The second vector contains the quantities of each item sold during the day or the week.
Write a dyadic function QuantitiesSold
that accepts these two vectors as arguments and calculates how many items of each reference code have been sold. Preferably, use a partitioning function.
ref ← 47 47 83 83 83 83 83 29 36 36 36 50 50
qty ← 5 8 3 18 11 1 6 10 61 52 39 8 11
ref QuantitiesSold qty
13 39 10 152 19 ≡ ref QuantitiesSold qty
You are given two character matrices with the same number of columns.
Let us call them big
and small
.
You are asked to find where the rows of small
appear in big
. i.e., for each row in small
find the index of the same row in big
.
For those rows of small
which do not appear in big
, you can return the value 0
, or 1+≢big
.
Currently, index of ⍳
works on matrices, but this hasn’t always been the case.
Thus, can you solve this exercise without using index of on matrices?
(Using index of on vectors is still allowed!)
⎕← big ← 5 2⍴⍳10
⎕← small ← (2 2⍴⍳4)⍪8+2 2⍴⍳8
The result should be
big ⍳ small ⍝ 1 2 5 0 is also acceptable.
A partitioned enclose with a single zero as the left argument returns an empty vector. However, as you know already, not all empty vectors are the same. When working with empty vectors, we also work with prototypes, because an empty vector knows what it would contain if it were not empty.
Go over the expressions that follow and build the empty vector that matches the result of the empty partitioned enclose:
0⊂'Partition'
0⊂⍳10
0⊂⍬
0⊂3 2⍴⎕A
0⊂3 4 5⍴⍳60
0⊂(1 2 3)(4 5 6)(7 8 9)
0⊂('cat')('dog')(7 8 9)
0⊂(14 'cat' 8)('a' 2 'c' 4)(1 2 3)
The first one is already solved:
sol ← 0⍴⊂''
(0⊂'Partition')≡sol
Write a monadic function StartAndEnd
that, given a word or a list of words, returns a Boolean vector where 1 indicates a word that starts and ends with the same letter. Each word will have at least one letter and will consist entirely of either uppercase (A–Z) or lowercase (a–z) letters. Words consisting of a single letter can be scalars and are considered to start and end with the same letter.
StartAndEnd 'area' 'banana' 'shoes'
StartAndEnd 'cape'
StartAndEnd 'z'
Write a dyadic function Extract
that accepts a character vector left argument (let us call it text
) and an integer vector right argument (let us call it start
).
We would like to extract a part of text
as a simple character vector.
The extract is defined as a number of sub-vectors, each being five characters long, and starting at the positions given by start
.
text ← 'This boring text has been typed just for a little experiment.'
start ← 6 27 52
text Extract start
'borintypedxperi' ≡ text Extract start
This exercise is the same as the previous one, but instead of extracting five characters each time, you are asked to extract a variable number of characters specified by the variable long
.
You can use the same example as above plus the additional variable length
:
length ← 3 8 4
text ExtractL start length
'bortyped juxper' ≡ text ExtractL start length
10.16. The Specialist’s Section#
If you are exploring APL for the first time, skip this section and go to the next chapter.
10.16.1. Compatibility and Migration Level#
10.16.1.1. Migration Level#
In the early 1980s, a number of “second-generation” APL systems evolved to support nested arrays. Dyalog APL entered the market just as these systems were starting to appear, and decided to adopt the APL2 specification that IBM had been presenting to the world. In the event, unfortunately, the APL2 specification changed very late in this process, after Dyalog had more or less released Dyalog APL (or so the story goes). As a result, there are some minor differences between the dialects.
Just to give you an idea of the (sometimes) subtle differences, let us take a look at the expression a b c[2]
, where a
, b
, and c
are three vectors; for example:
a ← 1 2 3
b ← 4 5 6
c ← 7 8 9
The expression a b c[2]
is ambiguous; it may be interpreted in two different ways:
does it mean “create a 3-item vector made of
a
,b
, and the second item ofc
”; ordoes it mean “create a 3-item vector made of
a
,b
, andc
, and then take the second item of it (that is to say,b
enclosed)?
IBM chose the first interpretation, and in an IBM-compatible implementation of APL the result would be (1 2 3) (4 5 6) 8
.
In Dyalog APL, indexing is a function like any other function, in that it takes as its argument the entire vector on its left.
The result is therefore ⊂4 5 6
(⊂
because strand notation nested the items):
a b c[2]
As a minor player at the time, Dyalog wished to move the product in the direction of APL2, and in order to help the people who needed to use both IBM’s APL2 and Dyalog APL, and to make it easier to migrate an application from APL2 to Dyalog, a compatibility feature was introduced into Dyalog APL via a special system variable named ⎕ML
, where the letters “ML” stand for “migration level”.
The default value for ⎕ML
is 1
.
To use code written according to IBM’s conventions, it is possible to set ⎕ML
to higher values (up to 3
), and obtain an increasing (but not total) level of compatibility with IBM’s APL2.
In other words, setting ⎕ML
to 0
means “the Dyalog way”, which also shows that the default value for ⎕ML
is a small compromise between Dyalog’s original specification and APL2.
Today, Dyalog has become a major player in the APL market.
Pressure on Dyalog users to move in the direction of APL2 has faded and many users prefer the Dyalog definitions.
The unfortunate result of the story is that, depending on the roots of an application, code may be written to use any one of the possible migration levels.
In this book we use the default value of ⎕ML ← 1
, but we shall mention how some operations could be written in IBM’s notation.
It should be emphasised that when you select a non-zero value for ⎕ML
, the “Dyalog way” of operation will no longer be available for the primitive functions that are sensitive to the selected value of ⎕ML
.
Remark
⎕ML
is a normal system variable.
It can be localised in a function header or in a dynamic function, so that its influence is restricted to that function.
10.16.1.2. A List of Differences#
This list is not a complete list of language differences between IBM APL2 and Dyalog.
It only lists the features of Dyalog APL that can be made to function like those of APL2 by setting ⎕ML
appropriately.
The first column contains the operation we are talking about;
the second and third columns compare how you perform that operation the “Dyalog way” (with ⎕ML ← 0
) or in APL2, respectively; and the fourth column contains additional comments.
Operation |
Dyalog’s implementation |
IBM’s implementation |
Comments |
---|---|---|---|
Mix |
|
|
Same behaviour, different symbols. IBM’s definition requires |
Split |
|
|
Same behaviour, different symbols. IBM’s definition requires |
Partition |
|
|
Same syntax, but different behaviour. With |
First |
|
|
Same behaviour, different symbols. IBM’s definition requires |
Enlist |
n/a |
|
No Dyalog equivalent. Requires |
Type |
|
|
No special symbol in IBM’s definition. The IBM expression requires |
Depth |
|
|
If the items of |
|
Backspace, Linefeed, Newline |
Backspace, Newline, Linefeed |
The order of the contents of |
10.16.2. Computing the Type and Prototype#
In Section 10.9 we defined the type of an array and the prototype of an array, and yet, we have not discussed how to compute it.
However, you may have noticed that the discussion about ⎕ML
referenced a primitive called type.
10.16.2.1. Migration Level Zero#
In Dyalog APL, when ⎕ML
is set to 0
(the value that separates Dyalog APL from APL2 the most), the epsilon glyph stops representing the function enlist and becomes the function type.
In other words, the original versions of Dyalog APL included a function that computed the type of an array, and that function was ∊
.
However, in the present day, the default value of ⎕ML ← 1
means this function is generally not available.
Of course we can set ⎕ML ← 0
and see it in action:
⎕ML ← 0
∊hogwash
We can even check that we determined the type of hogwash
correctly before:
hogwashType≡∊hogwash
Similarly, if ⎕ML ← 0
, we can easily compute the prototype of an array:
∊⊃hogwash
We can see that 0
really is the prototype of hogwash
because when we overtake with ↑
, the fill items are 0
s:
6↑hogwash
Now, before we forget, let us restore ⎕ML
to its default value:
⎕ML ← 1
10.16.2.2. Migration Level Non-zero#
When we are working with the default ⎕ML
value – or any non-zero ⎕ML
value, for that matter – we cannot use the primitive function type ∊
because that function is not available.
In those cases, we must resort to other techniques to determine the type or the prototype of an array.
To determine the prototype of an array, we can reshape the array to be empty, and then ask for its first element:
⊃0⍴hogwash
Similarly, to determine the type of an array arr
, we can ask for the prototype of an array which has arr
as the first item:
⊃0⍴⊂hogwash
It is understandable if you find these two ways of determining the prototype and the type unsatisfying. After all, we are defining them in terms of themselves.
Another alternative follows, with a dfn that computes the type of an array recursively:
]dinput
type ← {
0=≡⍵: ⊃(⍵≡⍕⍵)⌽0' ' ⍝ Is ⍵ a simple scalar?
∇¨⍵ ⍝ If not, recurse.
}
type hogwash
hogwashType ≡ type hogwash
After having defined type
, prototype
follows trivially:
prototype ← {type⊃⍵}
prototype hogwash
10.16.3. High-rank Partition#
We studied the function partition applied to vectors in Section 10.12.2; it appeared to be extremely useful.
Its use is much more complex when applied to arrays of arbitrary rank. Let us just try it on a matrix:
chemistry
1 1 2 2 2 ⊆[2] chemistry
As we can see, partition operates along the specified axis, but it also separates all the items along the other axis, as if the matrix were seen through a grid.
In other words, partition ⊂
will preserve the rank of its argument array:
≢⍴chemistry
≢⍴1 1 2 2 2 ⊆[2] chemistry
This is unlike partitioned enclose, which always returns a vector where each item has the original rank:
⎕← r ← 1 0 1 0 0 ⊂[2] chemistry
Although visually similar, the result of applying partitioned enclose is a vector of length 2, and each of its items, in turn is a sub-matrix of the original matrix:
⍴r
⍴¨r
For partition, it is the other way around: the result is still a matrix, it’s the items that become vectors:
⍴r ← 1 1 2 2 2 ⊆[2] chemistry
⍴¨r
Once more, be careful about the visual similarity of the results if ]box
happens to be OFF:
]box off
⎕← 1 1 2 2 2 ⊆[2] chemistry
⎕← 1 0 1 0 0 ⊂[2] chemistry
]box on
Let us try using partition on a 3D array to see what the result looks like:
cuboid
1 2 3 3 ⊆[2] cuboid
Once more, we see that the rank of the original array is preserved and that each scalar of the new array contains a vector.
Rules
In r ← pattern ⊆[axis] array
:
the result
r
is an array of the same rank asarray
;the dimensions of the result and of the right argument array match, except possibly along the axis specified by
axis
; andthe length of the specified axis of the result is the number of partitions defined by
pattern
, which is≢∪⌈\pattern
.
10.16.4. Ambiguous Representation#
Even with ]box
ON, and with the ]display
user command, there are times where the visual representations of arrays are ambiguous:
⎕← v ← 5 8 '7' 9
]display v
In this form, the dash which should tell us that the 7
is a character is indistinguishable from the dashes used to draw the box.
We just know that one (or more) of the four items is a character because the +
symbol tells us that this array is mixed.
A convenient way to distinguish between numbers and letters is to look at the type of the array and compare it with 0
(for numbers) or ' '
(for letters):
' '=type v
10.16.5. Pick Inside a Scalar#
Suppose that one item of a nested variable is a vector which has been enclosed twice, and we would like to select one value out of its contents.
For example, how can we select the letter 'P'
in the following vector:
⎕← nv ← (3 5 2)(⊂'CARPACCIO')(6 8 1)
We might attempt to write
2 1 4 ⊃ nv
RANK ERROR
2 1 4⊃nv
∧
but that is incorrect because the second item of nv
is an enclosed scalar.
The index 1
would have been appropriate for a one-item vector, but not for a scalar.
The correct answer is:
2 ⍬ 4 ⊃ nv
10.17. Solutions#
The following solutions we propose are not necessarily the “best” ones; perhaps you will find other solutions that we have never considered. APL is a very rich language, and due to the general nature of its primitive functions and operators there are always plenty of different ways to express different solutions to a given problem. Which one is “the best” depends on many things, for example the level of experience of the programmer, the importance of system performance, the required behaviour in border cases, the requirement to meet certain programming standards and also personal preferences. This is one of the reasons why APL is so pleasant to teach and to learn!
We advise you to try and solve the exercises before reading the solutions!
Solution to Exercise 10.1
A very reasonable thing to do first is figure out the depth, rank, and shape, of the two arrays we are working with:
cm
nm
cm
is a simple (character) matrix, hence has depth1
; because it is a matrix, its rank is2
and its shape is3 7
; andnm
is a simple (numeric) matrix, hence has depth1
; because it is a matrix, its rank is2
and its shape is3 4
.
We can start by verifying this:
DRS ← {(≡⍵)(≢⍴⍵)(⍴⍵)} ⍝ Depth, Rank, and Shape of an array.
DRS cm
DRS nm
For the expression
(⊂cm)(⊂nm)
, we are enclosing both matrices into scalars. Then, strand notation will try to build a vector out of the different things it can find. Because it can find two things,⊂cm
and⊂nm
, the result will be a 2-item vector: rank1
, shape,2
.
We just have to determine the depth of the result.
To determine the depth of a vector, we first have to determine the depth of all of its items.
In this case, that will be ≡⊂cm
and ≡⊂nm
.
Both of these represent enclosures of simple matrices, so the depth of the enclosure is one plus the depth of the simple matrix, i.e. 2
.
Thus, the result is a vector where all items have depth 2
and, therefore, the result has depth 3
:
DRS (⊂cm)(⊂nm)
For the expression
(⊂cm),(⊂nm)
, what changes is the usage of the catenate primitive to build the final result, instead of letting strand notation do its work. Again, we have that the building blocks are⊂cm
and⊂nm
, but now catenate uses those as the items that build the final result. Because we are catenating two scalars, we get a 2-item vector: rank1
, shape,2
. However, this time the matrices themselves become the items of the vector, thus its depth is only2
:
(⊂cm),(⊂nm)
DRS (⊂cm),(⊂nm)
This contrasts with the previous expression, where the items of the vector were the enclosed scalars.
Finally, for the expression
cm,⊂nm
, we have something that won’t be homogeneous, and thus the result is going to have a negative depth. Notice that catenate is being used between a matrix (cm
), and a scalar (⊂nm
). As you’ve seen in Section 4.11.3, catenating a scalar to a matrix makes it so that the scalar is repeated over the rows of the matrix, to extend the matrix by one column.
Because the matrix cm
had 7
columns, the matrix cm,⊂nm
will have 8
columns, and its number of rows will remain unchanged, meaning the final shape is 3 8
.
Its rank will also remain unchanged.
What changes is the depth, because the result will no longer be a simple character matrix, but a nested matrix:
some of the elements will be characters, others will be numeric matrices.
Thus, the result will have some elements of depth 0
and others of depth 1
, making it so that the final depth is ¯2
:
cm,⊂nm
Solution to Exercise 10.2
Let’s take the three vectors we need and work with them:
a ← 1 2 3
b ← 4 5 6
c ← 7 8 9
a b c × 1 2 3
a b c
is a 3-item nested vector, and all the sub-vectors have depth 1, so a b c
has depth 2
.
Multiplying with 1 2 3
doesn’t change the structure, only the contents, so a b c × 1 2 3
is a 3-item vector of rank 1
and depth 2
:
DRS a b c × 1 2 3
(10 20),a
a
is a 3-item simple vector and (10 20)
is a 2-item simple vector, so their catenation yields a 5-item simple vector, thus its depth is 1
, its rank is 1
, and its shape is ,5
:
DRS (10 20),a
Note that the parenthesis are superfluous and can be removed:
DRS 10 20,a
The same is true for the next expression:
(10 20),a b
Now we are catenating (10 20)
, which is a simple numeric vector, with a b
, which is a 2-item nested vector. a b
has depth 2
, rank 1
, and shape ,2
.
When we catenate the two vectors, we get a heterogeneous 4-item vector, thus its depth will be ¯2
, its rank will be 1
, and its shape will be ,4
.
DRS (10 20),a b
As mentioned before, removing the parenthesis doesn’t change the result:
DRS 10 20,a b
a b 2 × c[2]
c[2]
is a simple scalar, thus multiplying it with a b 2
won’t change the structure of the array.
Now, a b 2
is a 3-item vector that is not homogeneous, because a
and b
are nested vectors.
a
and b
have depth 1
, thus the final array will have depth 2
, rank 1
, and shape ,3
:
DRS a b 2
10×a 20×b
The strand a 20
creates a 2-item vector that is being multiplied by b
, but b
is a 3-item vector, thus we will get a LENGTH ERROR
if we try to evaluate this expression:
10×a 20×b
LENGTH ERROR
10×a 20×b
∧
Solution to Exercise 10.3
We continue our work:
+/a b c
The strand a b c
builds a 3-item vector, and the +/
will reduce it to a single scalar containing the result of a+b+c
so, in other words, +/a b c
is the same as ⊂a+b+c
.
a+b+c
evaluates to a 3-item simple vector of depth 1
, thus ⊂a+b+c
has depth 2
.
Because it is a scalar, it has rank 0
and shape ⍬
:
DRS +/a b c
+/¨a b c
This expression is slightly different from the previous one in that the operator each ¨
was put next to the plus-reduction.
Because of the each, we will sum each of the three vectors, each of them producing a single scalar.
Now, it’s important to understand that f¨array
doesn’t change the outer structure of array
.
So, if a b c
is a 3-item vector, f¨a b c
will still be a 3-item vector.
In our case, because f ← +/
, we will get a 3-item simple numeric vector, meaning the final depth is 1
, the rank is 1
, and the shape is ,3
:
DRS +/¨a b c
1 0 1/¨a b c
The use of ¨
again tells us that the result will be a 3-item vector.
Now we are left with examining the contents of the resulting vector.
Because the left argument to compress isn’t enclosed, compress each will do 1/a
, 0/b
, and 1/c
.
1/a
and 1/c
don’t change the right argument arrays, but 0/b
produces an empty vector.
Thus, the result is equivalent to a ⍬ c
.
However, this doesn’t change any of the characteristics of a b c
, and the result is still a 3-item vector with rank 1
and depth 2
:
DRS 1 0 1/¨a b c
a ⍬ c≡1 0 1/¨a b c
(a b c)⍳4 5 6
This is a tricky question, because b ← 4 5 6
might lead you into thinking that the result is 2
.
However, index of looks at its left argument and finds a vector (although a nested one) so it will look at the right argument as a collection of scalars, and it will check for the position of each scalar in the left argument vector.
Because none of the scalars 4
, 5
, and 6
, are in the left argument vector, the final result is 4 4 4
which is a vector of depth 1
, rank 1
, and shape ,3
:
(a b c)⍳4 5 6 ⍝ the parenthesis are superfluous
DRS (a b c)⍳4 5 6
1 10 3 ∊ a
The result of a membership operation is an array with the same shape as the left argument, so it will be a 3-item vector.
Then, each scalar in the result is either a 0
or a 1
, so the result will be a simple vector: depth 1
, rank 1
, and shape ,3
:
DRS 1 10 3 ∊ a
(⊂1 0 1)/¨a b c
This is similar to one of the previous expressions, but now the left argument to /¨
is enclosed, meaning that 1 0 1
is the left argument that is used when compressing each of the items of the right argument a b c
.
The operator each makes it so that the final result is a 3-item vector as well.
Then, each of its scalars is the result of doing 1 0 1/
on one of a
, b
, or c
; the results of which are always a 2-item simple vector.
Therefore, the final result will be a nested vector of depth 2
, rank 1
, and shape ,3
:
(⊂1 0 1)/¨a b c
DRS (⊂1 0 1)/¨a b c
1 10 3 ∊ a b c
As seen above, the structure of the result of membership ∊
depends on the left argument only, so we have again that the result has depth 1
, rank 1
, and shape ,3
:
DRS 1 10 3 ∊ a b c
Solution to Exercise 10.4
We want to know what +/na
and ,/na
evaluate to, given na
:
⎕← na ← 1 2 (2 2⍴3 4 5 6)7 8
The simplest way to think about this is to write down the expression that the reduction is equivalent to:
⊂1 + 2 + (2 2⍴3 4 5 6) + 7 + 8
Notice the final enclose to guarantee that the result is a scalar: it’s there because reduce is supposed to reduce the rank of the argument. If we reduce a vector, the result will be a scalar, even if an enclosed one.
We have a series of scalar additions and, in the middle, addition with a matrix.
There are no shape mismatches, and thus we can just add all the scalars to all the positions in the matrix.
1 + 2 + 7 + 8
is 18
, thus the final result is ⊂18+2 2⍴3 4 5 6
, or ⊂2 2⍴21 22 23 24
:
+/na
As for ,/na
, we can do a similar exercise.
However, now we can’t shuffle things into the order that we prefer because ,
is not commutative.
Starting from the right, we first evaluate 7,8
to get 7 8
, and then we evaluate (2 2⍴3 4 5 6),7 8
.
Because the left argument to catenate is a matrix and the right argument is a vector, catenate will try to spread the vector across the rows of the matrix.
Because the number of rows matches the elements in the vector, that happens successfully and the result is 2 3⍴3 4 7 5 6 8
.
Then, we catenate two scalars (separately) to the left of the matrix, so those get replicated across the rows.
The final result is:
⊂2 5⍴1 2 3 4 7 1 2 5 6 8
,/na
Solution to Exercise 10.5
We have the two vectors here:
ref ← 47 47 83 83 83 83 83 29 36 36 36 50 50
qty ← 5 8 3 18 11 1 6 10 61 52 39 8 11
And we want to use a partitioning function to figure out how many items of each reference were sold.
In order to do that, we can try to create a Boolean vector that identifies whenever the vector ref
reaches a new reference.
If we do a pairwise not-equals reduction, we get quite close:
2≠/ref
The only issue is that this fails to identify the initial reference, but we can fix it by catenating a 1
in the beginning:
1,2≠/ref
With that out of the way, we can use partitioned enclose to get a nested vector where each sub-vector contains the quantities sold for that reference:
↑ref qty
(1,2≠/ref)⊂qty
Finally, we can add those up with an each:
+/¨(1,2≠/ref)⊂qty
Alternatively, we can mix the results after partitioned enclose and then sum along the last axis. The partitioned enclose is likely to return sub-vectors that do not have the same length, which means that fill items are inserted when we mix:
↑(1,2≠/ref)⊂qty
However, that doesn’t change the final result because adding zeroes does nothing:
+/↑(1,2≠/ref)⊂qty ⍝ still correct
Let’s wrap it into a function:
QuantitiesSold ← {+/↑(1,2≠/⍺)⊂⍵}
ref QuantitiesSold qty
If we wanted to return the quantities and the respective references, we could use the same Boolean vector to compress the references and partition the quantities:
]dinput
QuantitiesSoldAndRefs ← {
pat ← 1,2≠/⍺
(pat/⍺),[.5](+/↑pat⊂⍵)
}
ref QuantitiesSoldAndRefs qty
Solution to Exercise 10.6
In order to be able to look the rows of a matrix up on the rows of another matrix, we just need to use split to turn both matrices into vectors of rows:
⎕← big ← 5 2⍴⍳10
⎕← small ← (2 2⍴⍳4)⍪8+2 2⍴⍳8
(↓big)⍳↓small
Solution to Exercise 10.7
In order to understand the results of the empty partitioned encloses, it is helpful to think about what the result would be if it were a “normal” partitioned enclose.
For example, for the solved case, here is a “normal” partitioned enclose of 'Partition'
, using an arbitrary left argument with a couple of 1
s and a couple of 0
s:
1 0 0 1 0 1 0 0 0⊂'Partition'
The result is, thus, a vector of character vectors.
So, if the left argument is 0
, we get “an empty vector of empty character vectors”.
Here is a vector of empty character vectors:
'' '' '' ''
But this vector has 4 elements. To make it empty, we need to reshape it:
0⍴'' '' '' ''
But it is a waste of typing effort to write four ''
, for nothing, when we can just let reshape take care of reusing data:
0⍴4⍴⊂''
Now, reshaping twice is redundant, so we can keep only the last reshape:
0⍴⊂''
This matches the empty result:
(0⍴⊂'')≡0⊂'Partition'
Let us follow a similar reasoning for the remaining expressions.
0⊂⍳10
This is very similar to the example above, except the data is numeric instead of textual, so the result is an empty vector of empty numeric vectors:
(0⍴⊂⍬)≡0⊂⍳10
0⊂⍬
This is an attempt at a tricky question, but remember that ⍬
is a simple numeric vector, although an empty one.
Thus, the solution is the same:
(0⍴⊂⍬)≡0⊂⍬
0⊂3 2⍴⎕A
The partitioned enclose of a character matrix is a vector of character matrices, so the result of the empty partitioned enclose is going to be an empty vector of empty character matrices. Now, the question is: what is the shape of those empty matrices?
Well, by modifying the left argument to partitioned enclose, we can vary the number of columns in the sub-matrices:
1 1⊂3 2⍴⎕A
1 0⊂3 2⍴⎕A
But we see that the sub-matrices always have three rows, because that is how many rows the original matrix has. Hence, the final result is an empty vector of empty character matrices with three rows:
(0⍴⊂3 0⍴'')≡0⊂3 2⍴⎕A
0⊂3 4 5⍴⍳60
The train of thought for this example is very similar to the previous one, except we are working with an array of rank three.
The basic premise is the same, though, and that’s that modifying the pattern of the left argument of partitioned enclose is going to alter the dimension of the last axis of each sub-result, but each sub-result will have a shape that starts with 3 4
.
Hence, the final result is an empty vector of empty cuboids with shapes 3 4 0
:
(0⍴⊂3 4 0⍴⍬)≡0⊂3 4 5⍴⍳60
0⊂(1 2 3)(4 5 6)(7 8 9)
The right argument to partitioned enclose is a nested vector, so the result would generally be a vector, where each item would be a vector of triples of integers. So, in this case, the result is an empty vector of empty vectors of triples of integers:
(0⍴⊂0⍴⊂3⍴⍬)≡0⊂(1 2 3)(4 5 6)(7 8 9)
⍝ ↑ the triples of integers
⍝ ↑ the empty vectors of triples of integers
⍝↑ the empty vector of empty vectors of triples of integers
0⊂('cat')('dog')(7 8 9)
This example is very similar to the previous one, except now we have a mixed vector. However, when working with empty vectors, fill items, and prototypes, what matters is the first item of the vector, which is a 3-item character vector in this case. Thus, the result will be the same as before, except we have triples of characters instead of triples of integers:
(0⍴⊂0⍴⊂3⍴'')≡0⊂('cat')('dog')(7 8 9)
⍝ ↑ the triples of characters ...
0⊂(14 'cat' 8)('a' 2 'c' 4)(1 2 3)
Solving this final expression requires applying the same thought process as before. The first item is a 5-item vector with an integer, three characters, and another integer, so that is exactly what we shall recreate:
(0⍴⊂0⍴⊂0' '0)≡0⊂(14 'cat' 8)('a' 2 'c' 4)(1 2 3)
Solution to Exercise 10.8
To check if a word starts and ends with the same character, we can take a character from the front, one from the back, and compare those:
word ← 'area'
(1↑word)=¯1↑word
However, we can also make use of the first primitive that we just learned about.
First ⊃
picks the first character in a character vector, we are just left with picking the last element of the word.
An interesting way to look at it is by realising that the last element of a (character) vector is the first element of the reverse:
⊃⌽'last'
Thus, given a vector of words, we can use each to work on each word separately:
StartAndEnd ← { {(⊃⍵)=⊃⌽⍵}¨⍵ }
StartAndEnd 'area' 'banana' 'shoes' ⍝ 1 0 1
The functions appears to be working, so let us test it on the other examples:
StartAndEnd 'cape' ⍝ 0
Ok, clearly the function is not working yet.
The issue is that the argument ⍵
is a 4-item vector and {(⊃⍵)=⊃⌽⍵}¨
is going to traverse each of the characters of that vector.
The fix for this is using the primitive function nest to preprocess the argument, to guarantee that the argument is always nested.
StartAndEnd ← { {(⊃⍵)=⊃⌽⍵}¨⊆⍵ }
StartAndEnd 'area' 'banana' 'shoes' ⍝ 1 0 1
StartAndEnd 'cape' ⍝ 0
StartAndEnd 'z' ⍝ 1
Solution to Exercise 10.9
In order to extract the sub-vectors from the big character vector, we need to take the starting indices and count five indices starting from there:
start ← 6 27 52
start + ¯1+⊂⍳5
Now, the most pragmatic thing to do is to enlist all those indices and index directly into the character vector:
text ← 'This boring text has been typed just for a little experiment.'
text[∊start+¯1+⊂⍳5]
Putting this in a function gives:
Extract ← { ⍺[∊⍵+¯1+⊂⍳5] }
text Extract start
⍝ 'borintypedxperi'
Solution to Exercise 10.10
This exercise is very similar to the previous one, except that now we don’t extract sub-vectors of fixed length, the lengths depend on the right argument. Fixing this can be done by using iota each:
length ← 3 8 4
]dinput
ExtractL ← {
(start length) ← ⍵
⍺[∊start+¯1+⍳¨length]
}
text ExtractL start length
⍝ 'bortyped juxper'