0

I have a problem with writing a bash script and hope that someone can help me with this. I have written a few smaller scripts in bash before, so I'm not totally new, but there's still lots of space for improvement.

So, I have a file that only contains two columns of decimal numbers, like:

0.46    0.68
0.92    1.36
1.38    2.04
1.84    2.72
 2.3    3.4
2.76    4.08
3.22    4.76
3.68    5.44
4.14    6.12
...

What I want to do is to compare every number in the first column with every number in the second column and check, if any two numbers are equal and print this number then to the screen or a file.

I found an answer for how to do this in an excel table, but I would be really interested in how to do this in bash or maybe with awk.

The first problem for me is that I don't even know how I would compare the first number to all others in the second column. I guess that I would have to do this via arrays. I could read the two numbers by a 'while read var_1 var_2' command and then I would have to somehow add var_1 of each line to an array_1, same for var_2 for another array_2 and then I somehow would have to compare all the elements with each other.

But I don't know how to. I hope someone can help me.

4
  • Are you looking for string or numeric equality? I mean if "2.4" appeared in column 1 and "2.40" in column 2 - are those equal or not? Commented Jan 15, 2014 at 17:12
  • @ Ed Morton: I'm looking for numerical equality, so 2.4 = 2.40. Commented Jan 16, 2014 at 12:47
  • The the script you selected as the correct answer won't work for you as it is testing for string equality since in awk all array indices are strings. Commented Jan 16, 2014 at 17:27
  • You might be right, I'm not an expert with awk and just assumed that this would still happen for the answer I chose as the right one. Nevertheless it still works for me, because a case like 2.4 and 2.40 will never appear in my lists. Commented Jan 20, 2014 at 12:52

6 Answers 6

2

Using awk

awk 'FNR==NR {a[$1]++;next} ($2 in a) {print $2}' file file
4.08
1.38

Read the file and store column #1 in array a, then compare column #2 with array a

cat file
0.46    0.68
0.92    1.36
1.38    2.04
1.84    2.72
 2.3    3.4
2.76    4.08
3.22    4.76
3.68    5.44
4.14    6.12
4.08    1.38
Sign up to request clarification or add additional context in comments.

Comments

1

this line should work:

 awk '{a[$1]=1;b[$2]}END{for(x in b){a[x]++;if(a[x]>1)print x}}' file

note that the loop and check in END is for excluding the duplicated numbers in same column. if each col has unique numbers, that part could be simplified.

with fedorqui's example, the output is:

4.08
1.38


cat file
0.46    0.68
0.92    1.36
1.38    2.04
1.84    2.72
 2.3    3.4
2.76    4.08
3.22    4.76
3.68    5.44
4.14    6.12
4.08    1.38

7 Comments

Hey, thanks for your answer. It works, but the order numbers in the output is messed up, so it's difficult to find the lowest value.
@user3197817 why this answer and Jotne's give same output, one works "perfectly" one "messed up" ?
The output is the same, that's true, but the order is different. Jotne's output is sorted with the smallest number at the top and this is for me more convenient.
@user3197817 I pressed F5 and checked his answer, not sorted. also, in awk, if you use for(x in array) there is NO sorting happening. both answers are the same. If you want to have a sorted you could either do the sort in awk, or just pipe the output to sort.
Hm, I just copied his awk line into my terminal and redirected the input to it and it gives me the numbers starting with the smallest and ending with the highest: 15, 31, 46, 62, and so on.
|
1

one line, converting to one column, sort and use uniq to print only duplicates:

(awk '{print $1}' test_input|sort|uniq  ; awk '{print $2}' test_input|sort|uniq)|sort|uniq -d

4 Comments

converting to one column, sort and use uniq is not a good idea. what if there are duplicates in same col? apart from the awk;awk;sort;uniq
@Kent thx for comment, I've added uniq to remove duplicates in one column but you are true that there are many utilities in the chain
I saw the uniq, it removes the dups in the combination of col1 and col2. what I meant was, E.g. in col1 there are four foo, but in col2, there is no foo at all. so foo should not be in output.
@Kent the sort|uniq can be used also to remove duplicates from one column but it is now crazy complicated, I agree.
0

A bash solution that works the way you described:

#!/bin/bash

while read c1 c2 ;do
    c1a=("${c1a[@]}" "$c1")
    c2a=("${c2a[@]}" "$c2")
done < numbers.txt

for c1 in ${c1a[@]} ;do
    for c2 in ${c2a[@]} ;do
        [[ $c1 == $c2 ]] && echo $c1
    done
done

Comments

0

Using awk without read the file two times.

awk '{a[$1];b[$2];for (i in b) if (i in a) {print i;delete a[i];delete b[i]}}' file

Comments

0
awk '{ a[$1]; b[$2] }
END{
    for (x in a) {
        for (y in b) {
            if (x+0 == y) {
                print x
                break
            }
        }
    }
}' file

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.