3

I have the following shell script that reads in data from a file inputted at the command line. The file is a matrix of numbers, and I need to separate the file by columns and then sort the columns. Right now I can read the file and output the individual columns but I am getting lost on how to sort. I have inputted a sort statement, but it only sorts the first column.

EDIT: I have decided to take another route and actual transpose the matrix to turn the columns into rows. Since I have to later calculate the mean and median and have already successfully done this for the file row-wise earlier in the script - it was suggested to me to try and "spin" the matrix if you will to turn the columns into rows.

Here is my UPDATED code

     declare -a col=( )
     read -a line < "$1"
     numCols=${#line[@]}                          # save number of columns

     index=0
     while read -a line ; do
     for (( colCount=0; colCount<${#line[@]}; colCount++ )); do
      col[$index]=${line[$colCount]}
      ((index++))
     done
     done < "$1"

     for (( width = 0; width < numCols; width++ )); do
      for (( colCount = width; colCount < ${#col[@]}; colCount += numCols )    ); do

       printf "%s\t" ${col[$colCount]}
     done
    printf "\n"
   done

This gives me the following output:

    1 9 6 3 3 6
    1 3 7 6 4 4
    1 4 8 8 2 4
    1 5 9 9 1 7
    1 5 7 1 4 7

Though I'm now looking for:

    1 3 3 6 6 9
    1 3 4 4 6 7
    1 2 4 4 8 8
    1 1 5 7 9 9
    1 1 4 5 7 7

To try and sort the data, I have tried the following:

    sortCol=${col[$colCount]}
    eval col[$colCount]='($(sort <<<"${'$sortCol'[*]}"))'

Also: (which is how I sorted the row after reading in from line)

    sortCol=( $(printf '%s\t' "${col[$colCount]}" | sort -n) )

If you could provide any insight on this, it would be greatly appreciated!

8
  • try looking in the sort man page or alternatively search for this in google and look at one of the 100 relevant results. Commented Apr 22, 2015 at 7:05
  • I have searched in google and the 100 relevant posts you are talking about actually don't help in my situation. I have been stuck on this portion of my code for about 2 days now, and the stuff I have found on google wasn't helpful. I don't know the amount of columns that will be in each file, so I'm not sure how to sort using -k option with an unknown amount. I have also tried wc -w and -l and things of that sort and that hasn't helped me either. Commented Apr 22, 2015 at 7:12
  • I'm sorry, what are you actually trying to do, can you post the expected output ? Commented Apr 22, 2015 at 7:16
  • If I read it correctly he is wanting to read/sort/display each column independently of each other. Which there are only 2 approaches I can think of (1) read each column into a separate array and sort, or (2) read/sort each column sequentially, dropping the first column from each successive sort. Neither will look entirely pretty in bash. A nested awk may work as well. Commented Apr 22, 2015 at 7:19
  • Post it in the question. Commented Apr 22, 2015 at 7:20

4 Answers 4

1

Note, as mentioned in the comments, a pure bash solution isn't pretty. There are a number of ways to do it, but this is probably the most straight forward. The following requires reading all values per line into the array, and saving the matrix stride so it can be transposed to read all column values into a row matrix and sorted. All sorted columns are inserted into new row matrix a2. Transposing that row matrix yields your original matrix back in column sort order.

Note this will work for any rank of column matrix in your file.

#!/bin/bash

test -z "$1" && {           ## validate number of input
    printf "insufficient input. usage:  %s <filename>\n" "${0//*\//}"
    exit 1;
}

test -r "$1" || {           ## validate file was readable
    printf "error: file not readable '%s'. usage:  %s <filename>\n" "$1" "${0//*\//}"
    exit 1;
}

## function: my sort integer array - accepts array and returns sorted array
## Usage: array=( "$(msia ${array[@]})" )
msia() {
    local a=( "$@" )
    local sz=${#a[@]}
    local _tmp
    [[ $sz -lt 2 ]] && { echo "Warning: array not passed to fxn 'msia'"; return 1; }
    for((i=0;i<$sz;i++)); do
        for((j=$((sz-1));j>i;j--)); do
        [[ ${a[$i]} -gt ${a[$j]} ]] && {
            _tmp=${a[$i]}
            a[$i]=${a[$j]}
            a[$j]=$_tmp
        }
        done
    done
    echo ${a[@]}
    unset _tmp
    unset sz
    return 0
}

declare -a a1               ## declare arrays and matrix variables
declare -a a2
declare -i cnt=0
declare -i stride=0
declare -i sz=0

while read line; do         ## read all lines into array
    a1+=( $line );
    (( cnt == 0 )) && stride=${#a1[@]}  ## calculate matrix stride
    (( cnt++ ))
done < "$1"

sz=${#a1[@]}                ## calculate matrix size
                            ## print original array
printf "\noriginal array:\n\n"
for ((i = 0; i < sz; i += stride)); do
    for ((j = 0; j < stride; j++)); do
        printf " %s" ${a1[i+j]}
    done
    printf "\n"
done

                            ## sort columns from stride array
for ((j = 0; j < stride; j++)); do
    for ((i = 0; i < sz; i += stride)); do
        arow+=( ${a1[i+j]} )
    done
    a2+=( $(msia ${arow[@]}) )  ## create sorted array
    unset arow
done
                            ## print the sorted array
printf "\nsorted array:\n\n"
for ((j = 0; j < cnt; j++)); do
    for ((i = 0; i < sz; i += cnt)); do
        printf " %s" ${a2[i+j]}
    done
    printf "\n"
done

exit 0

Output

$ bash sort_cols2.sh dat/matrix.txt

original array:

 1 1 1 1 1
 9 3 4 5 5
 6 7 8 9 7
 3 6 8 9 1
 3 4 2 1 4
 6 4 4 7 7

sorted array:

 1 1 1 1 1
 3 3 2 1 1
 3 4 4 5 4
 6 4 4 7 5
 6 6 8 9 7
 9 7 8 9 7
Sign up to request clarification or add additional context in comments.

4 Comments

OP said in a comment that the number of columns is unknown.I don't know the amount of columns that will be in each file
Missed that, that is another reason I said it wasn't pretty. Where in the problem statement does it say the columns are of unknown number? I'll give a rework for unknown columns a shot.
It just says it in a comment, not the actual question.
Nevermind, I found it in the comment. It will take reading all values into a single matrix and setting a stride based on the number of values read in the first line. We'll look at that in the morning.
0

Awk script

awk '
{for(i=1;i<=NF;i++)a[i]=a[i]" "$i}      #Add to column array
END{
        for(i=1;i<=NF;i++){
                split(a[i],b)          #Split column
                x=asort(b)             #sort column
                for(j=1;j<=x;j++){     #loop through sort
                        d[j]=d[j](d[j]~/./?" ":"")b[j]  #Recreate lines
                }
        }
for(i=1;i<=NR;i++)print d[i]          #Print lines
}' file

Output

1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7

2 Comments

My hat's off to the awk solution. One issue you can help me with. Could you provide a brief explanation of the nesting of the loops? Particularly how the d designation is filled/initialized? Great solution, I'm just trying to digest it.
@DavidC.Rankin for(i=1;i<=NF;i++) loops through the columns/fields. I then split this to create a new array b, which is all the rows/records for that field.Then i loop through b adding b[j](the field for that row) to d[j](the reconstructed record). This is within for(i=1;i<=NF;i++) so it just adds them to d. HTH
0

Here's my entry in this little exercise. Should handle an arbitrary number of columns. I assume they're space-separated:

#!/bin/bash

linenumber=0
while read line; do
        i=0
        # Create an array for each column.
        for number in $line; do
                [ $linenumber == 0 ] && eval "array$i=()"
                eval "array$i+=($number)"
                (( i++ ))
        done    
        (( linenumber++ ))
done <$1
IFS=$'\n'
# Sort each column
for j in $(seq 0 $i ); do
        thisarray=array$j
        eval array$j='($(sort <<<"${'$thisarray'[*]}"))'
done    
# Print each array's 0'th entry, then 1, then 2, etc...
for k in $(seq 0 ${#array0[@]}); do
        for j in $(seq 0 $i ); do
                eval 'printf ${array'$j'['$k']}" "'
        done    
        echo "" 
done

Comments

0

Not bash but i think this python code worths a look showing how this task can be achieved using built-in functions.

From the interpreter:

$ cat matrix.txt 
1 1 1 1 1
9 3 4 5 5
6 7 8 9 7
3 6 8 9 1
3 4 2 1 4
6 4 4 7 7

$ python
Python 2.7.3 (default, Jun 19 2012, 17:11:17) 
[GCC 4.4.3] on hp-ux11
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> f = open('./matrix.txt')
>>> for row in zip(*[sorted(list(a)) 
               for a in zip(*[a.split() for a in f.readlines()])]):
...    print ' '.join(row)
... 
1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.