23

Awk offers associative indexing for array processing. Elements of 1 dimensional array can be iterated:

e.g.

for(index in arr1)
  print "arr1[" index "]=" arr1[index]

But how this kind done for a two dimensional array? Does kind of syntax,given below work?

for(index1 in arr2)
for(index2 in arr2)
   arr2[index1,index2]     
1
  • 1
    gawk as of v4 supports arrays as elements i.e. nested arrays, more flexible than multidimensional arrays, for (i in arr2) for (j in arr2[i]) print arr2[i][j], see JJoao's answer Commented Mar 6, 2017 at 21:29

5 Answers 5

44

AWK fakes multidimensional arrays by concatenating the indices with the character held in the SUBSEP variable (0x1c). You can iterate through a two-dimensional array using split like this (based on an example in the info gawk file):

awk 'BEGIN { OFS=","; array[1,2]=3; array[2,3]=5; array[3,4]=8; 
  for (comb in array) {split(comb,sep,SUBSEP);
    print sep[1], sep[2], array[sep[1],sep[2]]}}'

Output:

2,3,5
3,4,8
1,2,3

You can, however, iterate over a numerically indexed array using nested for loops:

for (i = 1; i <= width; i++)
    for (j = 1; j < = height; j++)
        print array[i, j]

Another noteworthy bit of information from the GAWK manual:

To test whether a particular index sequence exists in a multidimensional array, use the same operator (in) that is used for single dimensional arrays. Write the whole sequence of indices in parentheses, separated by commas, as the left operand:

if ((subscript1, subscript2, ...) in array)
   ...

Gawk 4 adds arrays of arrays. From that link:

for (i in array) {
    if (isarray(array[i])) {
        for (j in array[i]) {
            print array[i][j]
        }
    }
    else
        print array[i]
}

Also see Traversing Arrays of Arrays for information about the following function which walks an arbitrarily dimensioned array of arrays, including jagged ones:

function walk_array(arr, name,      i)
{
    for (i in arr) {
        if (isarray(arr[i]))
            walk_array(arr[i], (name "[" i "]"))
        else
            printf("%s[%s] = %s\n", name, i, arr[i])
    }
} 
Sign up to request clarification or add additional context in comments.

Comments

7

No, the syntax

for(index1 in arr2) for(index2 in arr2) {
    print arr2[index1][index2];
}

won't work. Awk doesn't truly support multi-dimensional arrays. What it does, if you do something like

x[1,2] = 5;

is to concatenate the two indexes (1 & 2) to make a string, separated by the value of the SUBSEP variable. If this is equal to "*", then you'd have the same effect as

x["1*2"] = 5;

The default value of SUBSEP is a non-printing character, corresponding to Ctrl+\. You can see this with the following script:

BEGIN {
    x[1,2]=5;
    x[2,4]=7;
    for (ix in x) {
        print ix;
    }
}

Running this gives:

% awk -f scriptfile | cat -v
1^\2
2^\4

So, in answer to your question - how to iterate a multi-dimensional array - just use a single for(a in b) loop, but you may need some extra work to split up a into its x and y parts.

Comments

4

I'll provide an example of how I use this in my work processing query data. Suppose you have an extract file full of transactions by product category and customer id:

customer_id  category  sales
1111         parts     100.01
1212         parts       5.20
2211         screws      1.33
...etc...

Its easy to use awk to count total distinct customers with a purchase:

awk 'NR>1 {a[$1]++} END {for (i in a) total++; print "customers: " total}' \ 
datafile.txt

However, computing the number of distinct customers with a purchase in each category suggests a two dimensional array:

awk 'NR>1 {a[$2,$1]++} 
      END {for (i in a) {split(i,arr,SUBSEP); custs[arr[1]]++}
           for (k in custs) printf "category: %s customers:%d\n", k, custs[k]}' \
datafile.txt

The increment of custs[arr[1]]++ works because each category/customer_id pair is unique as an index to the associative array used by awk.

In truth, I use gnu awk which is faster and can do array[i][j] as D. Williamson mentioned. But I wanted to be sure I could do this in standard awk.

Comments

3

The current versions of gawk (the gnu awk, default in linux, and possible to install everywhere you want), has real multidimensional arrays.

for(b in a)
   for(c in a[b])
      print a[b][c], c , b

See also function isarray()

Comments

1

awk(1) was originally designed -- in part -- to be teaching tool for the C language, and multi-dimensional arrays have been in both C and awk(1) pretty much forever. as such POSIX IEEE 1003.2 standardized them.

To explore the syntax and semantics, if you create the following file called "test.awk":

BEGIN {
  KEY["a"]="a";
  KEY["b"]="b";
  KEY["c"]="c";
  MULTI["a"]["test_a"]="date a";
  MULTI["b"]["test_b"]="dbte b";
  MULTI["c"]["test_c"]="dcte c";
}
END {
  for(k in KEY) {
    kk="test_" k ;
    print MULTI[k][kk]
  }
  for(q in MULTI) {
    print q
  }
  for(p in MULTI) {
    for( pp in MULTI[p] ) {
      print MULTI[p][pp]
    }
  }
}

and run it with this command:

awk -f test.awk /dev/null

you will get the following output:

date a
dbte b
dcte c
a
b
c
date a
dbte b
dcte c

at least on Linux Mint 18 Cinnamon 64-bit 4.4.0-21-generic #37-Ubuntu SMP

36 Comments

You are using nonstandard GNU extensions in your tests. These do not work in mawk, which explicitly "conforms to the Posix 1003.2 (draft 11.3) definition of the AWK language" (this refers to the second part of POSIX before 1997 and is confusingly obsoleted by IEEE Std 1003.1-2017, aka POSIX.1-2017). The current POSIX spec for awk still lacks references to your syntax.
considering I helped write the awk(1) standard for POSIX IEEE 1003.2, I'm happy to point to that work and rely on it. as per the above documentation it works on awk that came installed Linux Mint 18 Cinnamon 64-bit 4.4.0-21-generic #37-Ubuntu SMP.
Thanks for your work on the spec! Could you link to the spec you wrote and point to where it mandates multi-dimensional array support? I couldn't find it. (Also, why is it missing from POSIX.1-2017?) Which implementation of awk comes with Mint 18 (ls -l /etc/alternatives/awk) and at what version? For me, gawk 'BEGIN { a[2]["c"] = 4 }' works fine (gawk 4.2.1) but mawk 'BEGIN { a[2]["c"] = 4 }' gives me a syntax error (mawk 1.3.3).
The question is tagged awk and not gawk. This answer does not discuss the POSIX multi-dimensional array approximation, which uses commas to indicate SUBSEP delimiters for a second dimension. That format is a bit unwieldy since it's so hard to tease out and it can't facilitate a third dimension. More importantly, as I noted, mawk won't accept a[2]["c"] since it's not in any spec (beyond the gawk man/info pages). Many systems use the fully POSIX-compliant mawk or nawk (rather than gawk) as /usr/bin/awk. A gawk-only answer won't work for users of such systems.
man awk is tied to the manual for whatever you have installed as your awk binary; gawk for you and mawk for me. There are two instances of ][ in the latest awk spec and neither refer to true multidimensional arrays like a[2]["c"]. All I see are lines like “Because awk arrays are really one-dimensional, such a <comma>-separated list shall be converted to a single string by concatenating the string values of the separate expressions, each separated from the other by the value of the SUBSEP variable.”
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.