Bash script for comparing numbers in columns

Question

I have a problem with writing a bash script and hope that someone can help me with this. I have written a few smaller scripts in bash before, so I'm not totally new, but there's still lots of space for improvement.

So, I have a file that only contains two columns of decimal numbers, like:

0.46    0.68
0.92    1.36
1.38    2.04
1.84    2.72
 2.3    3.4
2.76    4.08
3.22    4.76
3.68    5.44
4.14    6.12
...

What I want to do is to compare every number in the first column with every number in the second column and check, if any two numbers are equal and print this number then to the screen or a file.

I found an answer for how to do this in an excel table, but I would be really interested in how to do this in bash or maybe with awk.

The first problem for me is that I don't even know how I would compare the first number to all others in the second column. I guess that I would have to do this via arrays. I could read the two numbers by a 'while read var_1 var_2' command and then I would have to somehow add var_1 of each line to an array_1, same for var_2 for another array_2 and then I somehow would have to compare all the elements with each other.

But I don't know how to. I hope someone can help me.

Are you looking for string or numeric equality? I mean if "2.4" appeared in column 1 and "2.40" in column 2 - are those equal or not? — Ed Morton
– Ed Morton, Commented Jan 15, 2014 at 17:12
@ Ed Morton: I'm looking for numerical equality, so 2.4 = 2.40. — drunk user
– drunk user, Commented Jan 16, 2014 at 12:47
The the script you selected as the correct answer won't work for you as it is testing for string equality since in awk all array indices are strings. — Ed Morton
– Ed Morton, Commented Jan 16, 2014 at 17:27
You might be right, I'm not an expert with awk and just assumed that this would still happen for the answer I chose as the right one. Nevertheless it still works for me, because a case like 2.4 and 2.40 will never appear in my lists. — drunk user
– drunk user, Commented Jan 20, 2014 at 12:52

Jotne · Accepted Answer · 2014-01-15 13:43:22Z

2

Using awk

awk 'FNR==NR {a[$1]++;next} ($2 in a) {print $2}' file file
4.08
1.38

Read the file and store column #1 in array a, then compare column #2 with array a

cat file
0.46    0.68
0.92    1.36
1.38    2.04
1.84    2.72
 2.3    3.4
2.76    4.08
3.22    4.76
3.68    5.44
4.14    6.12
4.08    1.38

answered Jan 15, 2014 at 13:43

Jotne

41.7k13 gold badges54 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kent · Accepted Answer · 2014-01-15 13:47:29Z

1

this line should work:

 awk '{a[$1]=1;b[$2]}END{for(x in b){a[x]++;if(a[x]>1)print x}}' file

note that the loop and check in END is for excluding the duplicated numbers in same column. if each col has unique numbers, that part could be simplified.

with fedorqui's example, the output is:

4.08
1.38

cat file
0.46    0.68
0.92    1.36
1.38    2.04
1.84    2.72
 2.3    3.4
2.76    4.08
3.22    4.76
3.68    5.44
4.14    6.12
4.08    1.38

edited Jan 15, 2014 at 13:47

answered Jan 15, 2014 at 13:42

Kent

197k36 gold badges248 silver badges317 bronze badges

7 Comments

drunk user Over a year ago

Hey, thanks for your answer. It works, but the order numbers in the output is messed up, so it's difficult to find the lowest value.

Kent Over a year ago

@user3197817 why this answer and Jotne's give same output, one works "perfectly" one "messed up" ?

drunk user Over a year ago

The output is the same, that's true, but the order is different. Jotne's output is sorted with the smallest number at the top and this is for me more convenient.

Kent Over a year ago

@user3197817 I pressed F5 and checked his answer, not sorted. also, in awk, if you use for(x in array) there is NO sorting happening. both answers are the same. If you want to have a sorted you could either do the sort in awk, or just pipe the output to sort.

drunk user Over a year ago

Hm, I just copied his awk line into my terminal and redirected the input to it and it gives me the numbers starting with the smallest and ending with the highest: 15, 31, 46, 62, and so on.

|

JosefN · Accepted Answer · 2014-01-15 14:27:59Z

1

one line, converting to one column, sort and use uniq to print only duplicates:

(awk '{print $1}' test_input|sort|uniq  ; awk '{print $2}' test_input|sort|uniq)|sort|uniq -d

edited Jan 15, 2014 at 14:27

answered Jan 15, 2014 at 14:11

JosefN

9546 silver badges9 bronze badges

4 Comments

Kent Over a year ago

converting to one column, sort and use uniq is not a good idea. what if there are duplicates in same col? apart from the awk;awk;sort;uniq

JosefN Over a year ago

@Kent thx for comment, I've added uniq to remove duplicates in one column but you are true that there are many utilities in the chain

Kent Over a year ago

I saw the uniq, it removes the dups in the combination of col1 and col2. what I meant was, E.g. in col1 there are four foo, but in col2, there is no foo at all. so foo should not be in output.

JosefN Over a year ago

@Kent the sort|uniq can be used also to remove duplicates from one column but it is now crazy complicated, I agree.

Jakub Kotowski · Accepted Answer · 2014-01-15 13:47:41Z

0

A bash solution that works the way you described:

#!/bin/bash

while read c1 c2 ;do
    c1a=("${c1a[@]}" "$c1")
    c2a=("${c2a[@]}" "$c2")
done < numbers.txt

for c1 in ${c1a[@]} ;do
    for c2 in ${c2a[@]} ;do
        [[ $c1 == $c2 ]] && echo $c1
    done
done

answered Jan 15, 2014 at 13:47

Jakub Kotowski

7,5911 gold badge32 silver badges38 bronze badges

Comments

BMW · Accepted Answer · 2014-01-16 06:36:49Z

0

Using awk without read the file two times.

awk '{a[$1];b[$2];for (i in b) if (i in a) {print i;delete a[i];delete b[i]}}' file

answered Jan 16, 2014 at 6:36

BMW

45.6k13 gold badges105 silver badges123 bronze badges

Comments

Ed Morton · Accepted Answer · 2014-01-16 17:32:47Z

0

awk '{ a[$1]; b[$2] }
END{
    for (x in a) {
        for (y in b) {
            if (x+0 == y) {
                print x
                break
            }
        }
    }
}' file

answered Jan 16, 2014 at 17:32

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Collectives™ on Stack Overflow

Bash script for comparing numbers in columns

6 Answers 6

Comments

7 Comments

4 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

7 Comments

4 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related