3

If I have a file example1.txt containing multiple strings

str1
str2
str3
...

I can read them into a bash array by using

mapfile -t mystrings < example1.txt.

Now say my file example2.txt is formatted as a table

str11 str12 str13
str21 str22 str23
str31 str32 str33
...   ...   ...

and I want to read each column into a different array. I know I can use other tools such as awk to separate each line into fields. Is there some way to combine this functionality with mapfile? I'm looking for something like

mapfile -t firstcol < $(cat example2.txt | awk '//{printf $1"\n"}')
mapfile -t secondcol < $(cat example2.txt | awk '//{printf $2"\n"}')

(which doesn't work).

Any other suggestion on how to handle a table in bash is also welcome.

0

3 Answers 3

6

Reading each row is simple, so let's build off that. I'll assume you have a proper matrix (i.e., each row has the same number of columns. This will be much easier since you are using bash 4.3.

while read -a row; do
    c=0
    for value in "${row[@]}"; do
        declare -n column=column_$(( c++ ))
        column+=( "$value" )
    done
done < table.txt

There! Now, did it work?

$ echo "${column_0[@]}"
str11 str21 str31 
$ echo "${column_1[@]}"
str12 str22 str32

I think so!

declare -n makes a nameref to an array (implicitly declared by the += on the next line) using a counter that increments as we iterate over each row. Then we simply append the current column value to the array behind the current nameref.

Sign up to request clarification or add additional context in comments.

1 Comment

This is almost the approach I went with myself (didn't yet have the answer on which version to target when I chose something different).
2

You should be using process substitution like this:

mapfile -t firstcol < <(awk '{print $1}' example2.txt)

mapfile -t secondcol < <(awk '{print $2}' example2.txt)

mapfile -t thirdcol < <(awk '{print $3}' example2.txt)

6 Comments

Seems quite inefficient to be reading the file once per column rather than doing a single pass.
@mbrandalero: process substitution acts like a file and avoids creating temporary files. @Charles: Yes, if there are large number of columns then yes it will involve running awk that many times. My answer was focussing to fix the problem in OP's attempt of mapfile with awk.
@mbrandalero, if you did a pipe from awk to mapfile, the data would be lost when the subshell running mapfile on the right-hand side of the pipeline exited. Thus, using this idiom is necessary.
Unless the file is huge, this is probably a good tradeoff for versions prior to 4.3. Getting my approach to work without namerefs is a bit tedious.
Thanks for the useful explanation!
|
1

Hmm. Something like this, perhaps?

readarrays() {
  declare -a values
  declare idx line=0
  while read -a values; do
    for idx in "${!values[@]}"; do
      [[ ${@:idx+1:1} ]] || break
      declare -g "${@:idx+1:1}[$line]=${values[@]:idx:1}"
    done
    (( ++line ))
  done
}

Tested as:

bash4-4.3$ (readarrays one two three <<<$'a b c\nd e f'; declare -p one two three)
declare -a one='([0]="a" [1]="d")'
declare -a two='([0]="b" [1]="e")'
declare -a three='([0]="c" [1]="f")'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.