Awk array iteration for multi-dimensional arrays

Question

Awk offers associative indexing for array processing. Elements of 1 dimensional array can be iterated:

e.g.

for(index in arr1)
  print "arr1[" index "]=" arr1[index]

But how this kind done for a two dimensional array? Does kind of syntax,given below work?

for(index1 in arr2)
for(index2 in arr2)
   arr2[index1,index2]

gawk as of v4 supports arrays as elements i.e. nested arrays, more flexible than multidimensional arrays, for (i in arr2) for (j in arr2[i]) print arr2[i][j], see JJoao's answer — jthill
– jthill, Commented Mar 6, 2017 at 21:29

Vitalizzare · Accepted Answer · 2024-03-02 20:39:47Z

AWK fakes multidimensional arrays by concatenating the indices with the character held in the SUBSEP variable (0x1c). You can iterate through a two-dimensional array using split like this (based on an example in the info gawk file):

awk 'BEGIN { OFS=","; array[1,2]=3; array[2,3]=5; array[3,4]=8; 
  for (comb in array) {split(comb,sep,SUBSEP);
    print sep[1], sep[2], array[sep[1],sep[2]]}}'

Output:

2,3,5
3,4,8
1,2,3

You can, however, iterate over a numerically indexed array using nested for loops:

for (i = 1; i <= width; i++)
    for (j = 1; j < = height; j++)
        print array[i, j]

Another noteworthy bit of information from the GAWK manual:

To test whether a particular index sequence exists in a multidimensional array, use the same operator (in) that is used for single dimensional arrays. Write the whole sequence of indices in parentheses, separated by commas, as the left operand:
if ((subscript1, subscript2, ...) in array)
   ...

Gawk 4 adds arrays of arrays. From that link:

for (i in array) {
    if (isarray(array[i])) {
        for (j in array[i]) {
            print array[i][j]
        }
    }
    else
        print array[i]
}

Also see Traversing Arrays of Arrays for information about the following function which walks an arbitrarily dimensioned array of arrays, including jagged ones:

function walk_array(arr, name,      i)
{
    for (i in arr) {
        if (isarray(arr[i]))
            walk_array(arr[i], (name "[" i "]"))
        else
            printf("%s[%s] = %s\n", name, i, arr[i])
    }
}

psmears · Accepted Answer · 2010-06-17 10:06:34Z

No, the syntax

for(index1 in arr2) for(index2 in arr2) {
    print arr2[index1][index2];
}

won't work. Awk doesn't truly support multi-dimensional arrays. What it does, if you do something like

x[1,2] = 5;

is to concatenate the two indexes (1 & 2) to make a string, separated by the value of the SUBSEP variable. If this is equal to "*", then you'd have the same effect as

x["1*2"] = 5;

The default value of SUBSEP is a non-printing character, corresponding to Ctrl+\. You can see this with the following script:

BEGIN {
    x[1,2]=5;
    x[2,4]=7;
    for (ix in x) {
        print ix;
    }
}

Running this gives:

% awk -f scriptfile | cat -v
1^\2
2^\4

So, in answer to your question - how to iterate a multi-dimensional array - just use a single for(a in b) loop, but you may need some extra work to split up a into its x and y parts.

Merlin · Accepted Answer · 2016-12-28 07:21:11Z

I'll provide an example of how I use this in my work processing query data. Suppose you have an extract file full of transactions by product category and customer id:

customer_id  category  sales
1111         parts     100.01
1212         parts       5.20
2211         screws      1.33
...etc...

Its easy to use awk to count total distinct customers with a purchase:

awk 'NR>1 {a[$1]++} END {for (i in a) total++; print "customers: " total}' \ 
datafile.txt

However, computing the number of distinct customers with a purchase in each category suggests a two dimensional array:

awk 'NR>1 {a[$2,$1]++} 
      END {for (i in a) {split(i,arr,SUBSEP); custs[arr[1]]++}
           for (k in custs) printf "category: %s customers:%d\n", k, custs[k]}' \
datafile.txt

The increment of custs[arr[1]]++ works because each category/customer_id pair is unique as an index to the associative array used by awk.

In truth, I use gnu awk which is faster and can do array[i][j] as D. Williamson mentioned. But I wanted to be sure I could do this in standard awk.

JJoao · Accepted Answer · 2016-04-09 22:09:27Z

3

The current versions of gawk (the gnu awk, default in linux, and possible to install everywhere you want), has real multidimensional arrays.

for(b in a)
   for(c in a[b])
      print a[b][c], c , b

Comments

Bob Makowski · Accepted Answer · 2017-12-29 16:11:42Z

1

awk(1) was originally designed -- in part -- to be teaching tool for the C language, and multi-dimensional arrays have been in both C and awk(1) pretty much forever. as such POSIX IEEE 1003.2 standardized them.

To explore the syntax and semantics, if you create the following file called "test.awk":

BEGIN {
  KEY["a"]="a";
  KEY["b"]="b";
  KEY["c"]="c";
  MULTI["a"]["test_a"]="date a";
  MULTI["b"]["test_b"]="dbte b";
  MULTI["c"]["test_c"]="dcte c";
}
END {
  for(k in KEY) {
    kk="test_" k ;
    print MULTI[k][kk]
  }
  for(q in MULTI) {
    print q
  }
  for(p in MULTI) {
    for( pp in MULTI[p] ) {
      print MULTI[p][pp]
    }
  }
}

and run it with this command:

awk -f test.awk /dev/null

you will get the following output:

date a
dbte b
dcte c
a
b
c
date a
dbte b
dcte c

at least on Linux Mint 18 Cinnamon 64-bit 4.4.0-21-generic #37-Ubuntu SMP

answered Dec 29, 2017 at 16:11

Bob Makowski

1,1231 gold badge9 silver badges10 bronze badges

36 Comments

Adam Katz Over a year ago

You are using nonstandard GNU extensions in your tests. These do not work in mawk, which explicitly "conforms to the Posix 1003.2 (draft 11.3) definition of the AWK language" (this refers to the second part of POSIX before 1997 and is confusingly obsoleted by IEEE Std 1003.1-2017, aka POSIX.1-2017). The current POSIX spec for awk still lacks references to your syntax.

Bob Makowski Over a year ago

considering I helped write the awk(1) standard for POSIX IEEE 1003.2, I'm happy to point to that work and rely on it. as per the above documentation it works on awk that came installed Linux Mint 18 Cinnamon 64-bit 4.4.0-21-generic #37-Ubuntu SMP.

Adam Katz Over a year ago

Thanks for your work on the spec! Could you link to the spec you wrote and point to where it mandates multi-dimensional array support? I couldn't find it. (Also, why is it missing from POSIX.1-2017?) Which implementation of awk comes with Mint 18 (ls -l /etc/alternatives/awk) and at what version? For me, gawk 'BEGIN { a[2]["c"] = 4 }' works fine (gawk 4.2.1) but mawk 'BEGIN { a[2]["c"] = 4 }' gives me a syntax error (mawk 1.3.3).

Adam Katz Over a year ago

The question is tagged awk and not gawk. This answer does not discuss the POSIX multi-dimensional array approximation, which uses commas to indicate SUBSEP delimiters for a second dimension. That format is a bit unwieldy since it's so hard to tease out and it can't facilitate a third dimension. More importantly, as I noted, mawk won't accept a[2]["c"] since it's not in any spec (beyond the gawk man/info pages). Many systems use the fully POSIX-compliant mawk or nawk (rather than gawk) as /usr/bin/awk. A gawk-only answer won't work for users of such systems.

Adam Katz Over a year ago

man awk is tied to the manual for whatever you have installed as your awk binary; gawk for you and mawk for me. There are two instances of ][ in the latest awk spec and neither refer to true multidimensional arrays like a[2]["c"]. All I see are lines like “Because awk arrays are really one-dimensional, such a <comma>-separated list shall be converted to a single string by concatenating the string values of the separate expressions, each separated from the other by the value of the SUBSEP variable.”

|

Collectives™ on Stack Overflow

Awk array iteration for multi-dimensional arrays

5 Answers 5

Comments

Comments

Comments

Comments

36 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

36 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related