Bash: Reading a CSV file and sorting column based on a condition

Question

I am trying read a CSV text file and print all entries of one column (sorted), based on a condition.

The input sample is as below:

Computer ID,User ID,M
Computer1,User3,5
Computer2,User5,8
computer3,User4,9
computer4,User10,3
computer5,User9,0
computer6,User1,11

The user-ID (2nd column) needs to be printed if the hours (third column) is greater than zero. However, the printed data should be sorted based on the user-id.

I have written the following script:

while IFS=, read -r col1 col2 col3 col4 col5 col6 col7 || [[ -n $col1 ]]
do
    if [ $col3 -gt 0 ] 
    then
        echo "$col2" > login.txt
    fi
done < <(tail -n+2 user-list.txt)

The output of this script is:

User3
User5
User4
User10
User1

I am expecting the following output:

User1
User3
User4
User5
User10

Any help would be appreciated. TIA

What do you mean by sort the input? login.txt is what needs to be sorted. — jordanm
– jordanm, Commented Nov 20, 2020 at 19:56
Do those userID's actually match reality? Are there only 9? If there are more than 9, are the names padded with 0, e.g. User001, User002, ... User789 ... ? — tink
– tink, Commented Nov 20, 2020 at 19:59
@SaadUrRehman: in comments to the answers you've mentioned user10 ... consider updating the question with a more detailed example of inputs and desired outputs — markp-fuso
– markp-fuso, Commented Nov 20, 2020 at 20:17
@markp-fuso thanks for pointing out. I have updated the question with a detailed example of current and desired outputs. — saadurr
– saadurr, Commented Nov 20, 2020 at 20:26

Raman Sailopal · Accepted Answer · 2020-11-20 20:53:57Z

2

awk -F, 'NR == 1 { next } $3 > 0 { match($2,/[[:digit:]]+/);map[$2]=substr($2,RSTART) } END { PROCINFO["sorted_in"]="@val_num_asc";for (i in map) { print i } }' user-list.txt > login.txt

Set the field delimiter to commas with -F, Ignore the header with NR == 1 { next } Set the index of an array (map) to the user when the 3rd delimited field is greater than 0. The value is set the number part of the User field (found with the match function) In the end block, set the sort order to value, number, ascending and loop through the map array created.

edited Nov 20, 2020 at 20:53

answered Nov 20, 2020 at 19:59

Raman Sailopal

13k2 gold badges15 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

saadurr Over a year ago

thanks, it works but user10 is printed right after user1 and before user2.

tink Over a year ago

Which brings us to the question I posed in the comments on your own question (which you hadn't answered) ...

Raman Sailopal Over a year ago

I've modified the solution. Now adding the number portion of User as a value to the array.

tink Over a year ago

Nice awk solution; one comment: you don't use num anywhere, no point in capturing it ...

Raman Sailopal Over a year ago

@tink Duly noted and taken out. Thanks

|

tink · Accepted Answer · 2020-11-20 20:13:18Z

1

The problem with your script (and I presume with the "sorting isn't working") is the place where you redirect (and may have tried to sort) - the following variant of your own script does the job:

#!/bin/bash
while IFS=, read -r col1 col2 col3 col4 col5 col6 col7 || [[ -n $col1 ]]
do
    if [ $col3 -gt 0 ] 
    then
        echo "$col2"
    fi
done < <(tail -n+2 user-list.txt) | sort > login.txt

Edit 1: Match new requirement

Sure we can fix the sorting; sort -k1.5,1.7n > login.txt

Of course, that, too, will only work if your user IDs are all 4 alphas and n digits ...

edited Nov 20, 2020 at 20:13

answered Nov 20, 2020 at 20:05

tink

15.6k5 gold badges54 silver badges58 bronze badges

1 Comment

saadurr Over a year ago

This prints works but prints in the order user1 user10 user2.... Can we do anything about that?

Timur Shtatland · Accepted Answer · 2020-11-20 22:14:46Z

1

Sort ASCIIbetically:

tail -n +2 user-list.txt | perl -F',' -lane 'print if $F[2] > 0;' | sort -t, -k2,2 
computer6,User1,11
computer4,User10,3
Computer1,User3,5
computer3,User4,9
Computer2,User5,8

Or sort numerically by the user number:

tail -n +2 user-list.txt | perl -F',' -lane 'print if $F[2] > 0;' | sort -t, -k2,2V
computer6,User1,11
Computer1,User3,5
computer3,User4,9
Computer2,User5,8
computer4,User10,3

answered Nov 20, 2020 at 22:14

Timur Shtatland

12.8k3 gold badges41 silver badges68 bronze badges

Comments

James Brown · Accepted Answer · 2020-11-22 16:50:03Z

Using awk for condition handling and sort for ordering:

$ awk -F, '                       # comma delimiter
FNR>1 && $3 {                     # skip header and accept only non-zero hours
    a[$2]++                       # count instances for duplicates
}
END {
    for(i in a)                   # all stored usernames
        for(j=1;j<=a[i];j++)      # remove this if there are no duplicates
            print i | "sort -V"   # send output to sort -V
}' file

Output:

User1
User3
User4
User5
User10

If there are no duplicated usernames, you can replace a[$2]++ with just a[$2] and remove the latter for. Also, no real need for sort to be inside awk program, you could just as well pipe data from awk to sort, like:

$ awk -F, 'FNR>1&&$3{a[$2]++}END{for(i in a)print i}' file | sort -V

FNR>1 && $3 skips the header and processes records where hours column is not null. If your data has records with negative hours and you only want positive hours, change it to FNR>1 && $3>0.

Or you could use grep with PCRE andsort:

$ grep -Po "(?<=,).*(?=,[1-9])" file | sort -V

Collectives™ on Stack Overflow

Bash: Reading a CSV file and sorting column based on a condition

4 Answers 4

6 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related