The existing answers are helpful and together cover all aspects, but I thought I'd give a more focused summary.
The question conflates two aspects:
- initializing arrays in Awk in general
- doing so to fill a two-dimensional array in particular
Array initialization:
Awk has no array literal (initializer) syntax.
The simplest workaround is to:
- represent the array elements as a single string and
- use the
split() function to split that string into the elements of an array.
$ awk 'BEGIN { n=split("Red Green Blue", arr); for (i=1;i<=n;++i) print arr[i] }'
Red
Green
Blue
This is what the OP did in their own helpful answer.
If the elements themselves contain whitespace, use a custom separator that's not part of the data, | in this example:
$ awk 'BEGIN { n=split("Red (1)|Green (2)", arr, "|"); for (i=1;i<=n;++i) print arr[i] }'
Red (1)
Green (2)
Initialization of a 2-dimensional array:
Per POSIX, Awk has no true multi-dimensional arrays, only an emulation of it using a one-dimensional array whose indices are implicitly concatenated with the value of built-in variable SUBSEP to form a single key (index; note that all Awk arrays are associative).
arr[1, 2] is effectively the same as arr[1 SUBSEP 2], where 1 SUBSEP 2 is a string concatenation that builds the key value.
- Because there aren't truly multiple dimensions - only a flat array of compound keys - you cannot enumerate the (pseudo-)dimensions individually with
for (i in ...), such as to get all sub-indices for primary (pseudo-)dimension 1 only.
- The default value of
SUBSEP is the "INFORMATION SEPARATOR ONE" character, a a rarely used control character that's unlikely to appear in date; in ASCII and UTF-8 it is represented as single byte 0x1f; if needed, you change the value.
By contrast, GNU Awk, as a nonstandard extension, does have support for true multi-dimensional arrays.
- Important: You must then always specify the indices separately; e.g., instead of
arr[1,2] you must use arr[1][2].
POSIX-compliant example (similar to TrueY's helpful answer):
awk 'BEGIN {
n=split("Red Green Blue", arrAux); for (i in arrAux) Colors[1,i] = arrAux[i]
n=split("Yellow Cyan Purple", arrAux); for (i in arrAux) Colors[2,i] = arrAux[i]
print Colors[1,2]
print "---"
# Enumerate all [2,*] values - see comments below.
for (i in Colors) { if (index(i, 2 SUBSEP)==1) print Colors[i] }
}'
Green
---
Yellow
Cyan
Purple
Note that the emulation of multi-dimensional arrays with a one-dimensional array using compound keys has the following inconvenient implications:
Auxiliary array auxArr is needed, because you cannot directly populate a given (pseudo-)dimension of an array.
You cannot enumerate just one (pseudo-)dimension with for (i in ...), you can only enumerate all indices, across (pseudo-)dimensions.
for (i in Colors) { if (index(i, 2 SUBSEP)==1) print Colors[i] }
above shows how to work around that by enumerating all keys and then matching only the ones whose first constituent index is 2, which means that the key value must start with 2, followed by SUBSEP.
GNU Awk example (similar to Steve's helpful answer, improved with Ed Morton's comment):
GNU Awk's (nonstandard) support for true multi-dimensional arrays makes the inconveniences of the POSIX-compliant solution (mostly) go away
(GNU Awk also doesn't have array initializers, however):
gawk 'BEGIN {
Colors[1][""]; split("Red Green Blue", Colors[1])
Colors[2][""]; split("Yellow Cyan Purple", Colors[2])
# NOTE: Always use *separate* indices: [1][2] instead of [1,2]
print Colors[1][2]
print "---"
# Enumerate all [2][*] values
for (i in Colors[2]) print Colors[2][i]
}'
Note:
Important: As stated, to address a specific element in a multi-dimensional array, always use separate indices; e.g., [1][2] rather than [1,2].
- If you use
[1,2] you'll get the standard POSIX-mandated behavior, and you'll mistakenly create a new, single index (key) with (string-concatenated) value 1 SUBSEP 2.
split() can conveniently be used to directly populate a sub-array.
As a prerequisite, however, the 2-dimensional target arrays must be initialized:
Colors[1][""] and Colors[2][""] do just that.
- Dummy index
[""] is just there to create a 2-dimensional array; it is discarded when split() fills that dimension later.
Enumerating a specific dimension with for (i in ...) is supported:
for (i in Colors[2]) ... conveniently enumerates only the sub-indices of Colors[2].
awk"arrays" are associative maps rather than indexed arrays with sizes and there is no notion of a multi-dimensional array (though you can fake it by using strings with, say, '_' as a separator for the index), possible there is a better way...SUBSEPvariable. Example:gawk 'BEGIN {x[1,1]=1; for (i in x) printf "%s\n", i}' | xxd -g1