Comparing a word in a string with another in another string

Question

I have a file with strings, like below:

ABCEF
RFGTH
ABCEF_ABCT
DRFRF_ABCT
LOIKH
LOIKH_DEFT

I need to extract the lines which have words matching even if they have _ABCT at the end.

while IFS= read -r line 
do
    if [ $line == $line ];
    then 
    echo "$line"
    fi  
done < "$file"

The output I want is:

ABCEF
ABCEF_ABCT
LOIKH
LOIKH_DEFT

I know I have a mistake in the IF branch but I just got out of options now and I don't know how to get the outcome I need.

if [ $line == $line ] -- is it a typo or is it your real code? And if you want to match lines starting with a prefix, why don't you use grep? — Andrea Corbellini
– Andrea Corbellini, Commented Feb 11, 2016 at 14:12
just edited to be more clear. i found no grep options which cater for repeated words. in grep you need to declare the word you need to find — luuke
– luuke, Commented Feb 11, 2016 at 14:20

Tom Fenech · Accepted Answer · 2016-02-11 14:25:47Z

1

I would use awk to solve this problem:

awk -F_ '{ ++count[$1]; line[NR] = $0 } 
END { for (i = 1; i <= NR; ++i) { split(line[i], a); if (count[a[1]] > 1) print line[i] } }' file

A count is kept of the first field of each line. Each line is saved to an array. Once the file is processed, any lines whose first part has a count greater than one are printed.

edited Feb 11, 2016 at 14:25

answered Feb 11, 2016 at 14:14

Tom Fenech

75.1k13 gold badges119 silver badges154 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user unknown · Accepted Answer · 2016-02-11 14:35:34Z

0

for w in $(for wrd in $(grep -o "^[A-Z]*" abc.dat) 
    do 
      n=$(grep -c $wrd abc.dat) 
      if (( $n > 1 )) 
      then
        echo $wrd
      fi 
     done | uniq)
do
  grep $w abc.dat
done

With grep -o extract tokens "^[A-Z]*" from beginning of line (^) only matching A-Z (not _). These tokens are searched again in the same file and counted (grep -c) and if > 1 collected. With uniq they are only taken once and then again we search for them in the file to find all matches, but only once.

answered Feb 11, 2016 at 14:35

user unknown

36.4k12 gold badges77 silver badges123 bronze badges

1 Comment

user unknown Over a year ago

@luuke: You shouldn't use comments for thanks, but up/downvote arrows. The best answer shall get the accepted mark. And you shouldn't say 'thanks in advance' and such stuff or other greetings in the question. No intro, no greetings, just the core question.

Andrea Corbellini · Accepted Answer · 2016-02-11 15:18:36Z

0

Here's a pure Bash solution using arrays and associative arrays:

#!/bin/bash

IFS=_
declare -A seen

while read -r -a tokens
do
    # ${tokens[0]} contains the first word before the underscore.
    word="${tokens[0]}"

    if [[ "${seen[$word]}" ]]
    then
        [[ "${seen[$word]}" -eq 1 ]] && echo "$word"
        echo "${tokens[*]}"
        (( seen["$word"]++ ))
    else
        seen["$word"]=1
    fi
done < "$file"

Output:

ABCEF
ABCEF_ABCT
LOIKH
LOIKH_DEFT

answered Feb 11, 2016 at 15:18

Andrea Corbellini

17.9k3 gold badges58 silver badges71 bronze badges

Comments

Varun · Accepted Answer · 2016-02-12 01:14:51Z

0

One more answer using sed

    #!/bin/bash
    #set -x
    counter=1;
    while read line ; do
            ((counter=counter+1))
            var=$(sed -n -e "$counter,\$ s/$line/$line/p" file.txt)
            if [  -n "$var" ]
            then
                    echo $line
                    echo $var
            fi
    done < file.txt

answered Feb 12, 2016 at 1:14

Varun

4774 silver badges9 bronze badges

Collectives™ on Stack Overflow

Comparing a word in a string with another in another string

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related