1

I am trying read a CSV text file and print all entries of one column (sorted), based on a condition.

The input sample is as below:

Computer ID,User ID,M
Computer1,User3,5
Computer2,User5,8
computer3,User4,9
computer4,User10,3
computer5,User9,0
computer6,User1,11

The user-ID (2nd column) needs to be printed if the hours (third column) is greater than zero. However, the printed data should be sorted based on the user-id.

I have written the following script:

while IFS=, read -r col1 col2 col3 col4 col5 col6 col7 || [[ -n $col1 ]]
do
    if [ $col3 -gt 0 ] 
    then
        echo "$col2" > login.txt
    fi
done < <(tail -n+2 user-list.txt)

The output of this script is:

User3
User5
User4
User10
User1

I am expecting the following output:

User1
User3
User4
User5
User10

Any help would be appreciated. TIA

4
  • What do you mean by sort the input? login.txt is what needs to be sorted. Commented Nov 20, 2020 at 19:56
  • Do those userID's actually match reality? Are there only 9? If there are more than 9, are the names padded with 0, e.g. User001, User002, ... User789 ... ? Commented Nov 20, 2020 at 19:59
  • @SaadUrRehman: in comments to the answers you've mentioned user10 ... consider updating the question with a more detailed example of inputs and desired outputs Commented Nov 20, 2020 at 20:17
  • @markp-fuso thanks for pointing out. I have updated the question with a detailed example of current and desired outputs. Commented Nov 20, 2020 at 20:26

4 Answers 4

2
awk -F, 'NR == 1 { next } $3 > 0 { match($2,/[[:digit:]]+/);map[$2]=substr($2,RSTART) } END { PROCINFO["sorted_in"]="@val_num_asc";for (i in map) { print i } }' user-list.txt > login.txt

Set the field delimiter to commas with -F, Ignore the header with NR == 1 { next } Set the index of an array (map) to the user when the 3rd delimited field is greater than 0. The value is set the number part of the User field (found with the match function) In the end block, set the sort order to value, number, ascending and loop through the map array created.

Sign up to request clarification or add additional context in comments.

6 Comments

thanks, it works but user10 is printed right after user1 and before user2.
Which brings us to the question I posed in the comments on your own question (which you hadn't answered) ...
I've modified the solution. Now adding the number portion of User as a value to the array.
Nice awk solution; one comment: you don't use num anywhere, no point in capturing it ...
@tink Duly noted and taken out. Thanks
|
1

The problem with your script (and I presume with the "sorting isn't working") is the place where you redirect (and may have tried to sort) - the following variant of your own script does the job:

#!/bin/bash
while IFS=, read -r col1 col2 col3 col4 col5 col6 col7 || [[ -n $col1 ]]
do
    if [ $col3 -gt 0 ] 
    then
        echo "$col2"
    fi
done < <(tail -n+2 user-list.txt) | sort > login.txt

Edit 1: Match new requirement

Sure we can fix the sorting; sort -k1.5,1.7n > login.txt

Of course, that, too, will only work if your user IDs are all 4 alphas and n digits ...

1 Comment

This prints works but prints in the order user1 user10 user2.... Can we do anything about that?
1

Sort ASCIIbetically:

tail -n +2 user-list.txt | perl -F',' -lane 'print if $F[2] > 0;' | sort -t, -k2,2 
computer6,User1,11
computer4,User10,3
Computer1,User3,5
computer3,User4,9
Computer2,User5,8

Or sort numerically by the user number:

tail -n +2 user-list.txt | perl -F',' -lane 'print if $F[2] > 0;' | sort -t, -k2,2V
computer6,User1,11
Computer1,User3,5
computer3,User4,9
Computer2,User5,8
computer4,User10,3

Comments

1

Using awk for condition handling and sort for ordering:

$ awk -F, '                       # comma delimiter
FNR>1 && $3 {                     # skip header and accept only non-zero hours
    a[$2]++                       # count instances for duplicates
}
END {
    for(i in a)                   # all stored usernames
        for(j=1;j<=a[i];j++)      # remove this if there are no duplicates
            print i | "sort -V"   # send output to sort -V
}' file

Output:

User1
User3
User4
User5
User10

If there are no duplicated usernames, you can replace a[$2]++ with just a[$2] and remove the latter for. Also, no real need for sort to be inside awk program, you could just as well pipe data from awk to sort, like:

$ awk -F, 'FNR>1&&$3{a[$2]++}END{for(i in a)print i}' file | sort -V 

FNR>1 && $3 skips the header and processes records where hours column is not null. If your data has records with negative hours and you only want positive hours, change it to FNR>1 && $3>0.

Or you could use grep with PCRE andsort:

$ grep -Po "(?<=,).*(?=,[1-9])" file | sort -V

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.