Using multiple arrays in Awk without duplication of code

Question

I have working code

BEGIN { FS=";"; }   # field separator
{ 
    if (match($2, /[0-9]+/)) {           # matching `ID` value
        m=substr($2, RSTART, RLENGTH);
        a[m]++;                          # accumulating number of lines for each `ID`
        print > m"_count.txt";    # writing lines pertaining to certain `ID` into respective file
    } 
}
END {
    for(i in a) { 
        print "mv "i"_count.txt "i"_"a[i]".txt"  # renaming files with actual counts
    }
}

Now i need to change it to do something like this. So i have three arrays of IDs and each array means separate folders to save result.

BEGIN { FS=";"; }   # field separator
{
    array1=(125 258 698 874)
    array2=(956 887 4455 22)
    array3=(111 444 558 966 332)
    if ($1 == $2) {varR=$3} else {varR=$2}
    if (match(varR, /[0-9]+/)) {           # matching `ID` value
        if ( varR in array1 ) {
            FolderName = "folder1/"
            m1=substr(varR, RSTART, RLENGTH);
            a1[m1]++;                          # accumulating number of lines for each `ID`
            print > (FolderName m1)"_count.txt";    # writing lines pertaining to certain `ID` into respective file
        }
        if ( varR in array2 ) {
            FolderName = "folder2/"
            m2=substr(varR, RSTART, RLENGTH);
            a2[m2]++;                          # accumulating number of lines for each `ID`
            print > (FolderName m2)"_count.txt";    # writing lines pertaining to certain `ID` into respective file
        }
        if ( varR in array3 ) {
            FolderName = "folder3/"
            m3=substr(varR, RSTART, RLENGTH);
            a3[m3]++;                          # accumulating number of lines for each `ID`
            print > (FolderName m3)"_count.txt";    # writing lines pertaining to certain `ID` into respective file
        }
    } 
}
END {
    for(i in a1) { 
        print "mv "i"_count.txt "i"_"a1[i]".txt"  # renaming files with actual counts
    }
    for(i in a2) { 
        print "mv "i"_count.txt "i"_"a2[i]".txt"  # renaming files with actual counts
    }
    for(i in a3) { 
        print "mv "i"_count.txt "i"_"a3[i]".txt"  # renaming files with actual counts
    }
}

As i need to save matching ID to txt files and put in needed folder What if i have 100 arrays? I need to duplicate code for each one?

Wihout having read in details your code, to avoid duplicating the code you can built an awk user function that you can call for each array. See this example using functions: stackoverflow.com/questions/42415826/… — George Vasiliou
– George Vasiliou, Commented Mar 31, 2017 at 12:08

Community · Accepted Answer · 2017-05-23 12:25:26Z

Using GNU Awk's multi-dimensional array support, here's a simplified solution that demonstrates the techniques you need:

$ gawk '
  BEGIN { FS=";" }   # field separator
  {
      # Initialize the sub-arrays of the multi-dimensional array.
      array[1][""]; split("125;258;698;874", aux); for (i in aux) array[1][aux[i]]
      array[2][""]; split("956;887;4455;22", aux); for (i in aux) array[2][aux[i]]
      array[3][""]; split("111;444;558;966;332", aux); for (i in aux) array[3][aux[i]]
      n = length(array) # The count of sub-arrays
      if ($1 == $2) {varR=$3} else {varR=$2}
      if (match(varR, /[0-9]+/)) {           # matching `ID` value
        for (i=1;i<=n;++i) {                 # loop over all arrays
          if (varR in array[i]) {            # look for the ID among the array keys
            print "folder" i
            break
          }
        }
      }        
  } 
' <<<'1;1;4455'
folder 2

See this answer of mine for an explanation of the array-initialization and multi-dimensional array techniques used in this command.
Note that the array initialization stores the numbers in the keys of arrays array[<n>], because that's what needed to look up values with <value> in array[<n>].

What you tried:

Awk has no array-initializer syntax; what array1=(125 258 698 874) in your code creates is a single string: "125258698874":
- The surrounding () have no effect here (they're just for precedence).
- Placing tokens - whether numeric or string - right next to each other in Awk performs string concatenation.
- Perhaps you mistakenly think that Bash's array-initializer syntax works in Awk too.
( varR in array1 ) looks for varR among the indices (keys) of array1, but had your array initialization worked the way it does in Bash, you'd have to check the values instead.

Michael Vehrs · Accepted Answer · 2017-03-31 13:05:52Z

0

Do you need to use different arrays, or could you do something like this:

a[1","1] = "abc";
a[1","2] = "xyz";
a[2","2] = "123";
folders[1] = "folder1";
folders[2] = "folder2";
var = "1";
for (f in folders) {
    if (var","f in a) {
        print a[var","f] " >> " folders[f] "/file_" var;
    }
}

answered Mar 31, 2017 at 13:05

Michael Vehrs

3,39314 silver badges11 bronze badges

1 Comment

Senior Pomidor Over a year ago

My arrays is only numbers, i need to use them only for match IDs and depending on that put result to folders which name depends on array numbers

Collectives™ on Stack Overflow

Using multiple arrays in Awk without duplication of code

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related