1

I have a file with strings, like below:

ABCEF
RFGTH
ABCEF_ABCT
DRFRF_ABCT
LOIKH
LOIKH_DEFT

I need to extract the lines which have words matching even if they have _ABCT at the end.

while IFS= read -r line 
do
    if [ $line == $line ];
    then 
    echo "$line"
    fi  
done < "$file"

The output I want is:

ABCEF
ABCEF_ABCT
LOIKH
LOIKH_DEFT

I know I have a mistake in the IF branch but I just got out of options now and I don't know how to get the outcome I need.

3
  • if [ $line == $line ] -- is it a typo or is it your real code? And if you want to match lines starting with a prefix, why don't you use grep? Commented Feb 11, 2016 at 14:12
  • just edited to be more clear. i found no grep options which cater for repeated words. in grep you need to declare the word you need to find Commented Feb 11, 2016 at 14:20
  • If is not a loop, but a branch. Commented Feb 11, 2016 at 14:24

4 Answers 4

1

I would use awk to solve this problem:

awk -F_ '{ ++count[$1]; line[NR] = $0 } 
END { for (i = 1; i <= NR; ++i) { split(line[i], a); if (count[a[1]] > 1) print line[i] } }' file

A count is kept of the first field of each line. Each line is saved to an array. Once the file is processed, any lines whose first part has a count greater than one are printed.

Sign up to request clarification or add additional context in comments.

Comments

0
for w in $(for wrd in $(grep -o "^[A-Z]*" abc.dat) 
    do 
      n=$(grep -c $wrd abc.dat) 
      if (( $n > 1 )) 
      then
        echo $wrd
      fi 
     done | uniq)
do
  grep $w abc.dat
done

With grep -o extract tokens "^[A-Z]*" from beginning of line (^) only matching A-Z (not _). These tokens are searched again in the same file and counted (grep -c) and if > 1 collected. With uniq they are only taken once and then again we search for them in the file to find all matches, but only once.

1 Comment

@luuke: You shouldn't use comments for thanks, but up/downvote arrows. The best answer shall get the accepted mark. And you shouldn't say 'thanks in advance' and such stuff or other greetings in the question. No intro, no greetings, just the core question.
0

Here's a pure Bash solution using arrays and associative arrays:

#!/bin/bash

IFS=_
declare -A seen

while read -r -a tokens
do
    # ${tokens[0]} contains the first word before the underscore.
    word="${tokens[0]}"

    if [[ "${seen[$word]}" ]]
    then
        [[ "${seen[$word]}" -eq 1 ]] && echo "$word"
        echo "${tokens[*]}"
        (( seen["$word"]++ ))
    else
        seen["$word"]=1
    fi
done < "$file"

Output:

ABCEF
ABCEF_ABCT
LOIKH
LOIKH_DEFT

Comments

0

One more answer using sed

    #!/bin/bash
    #set -x
    counter=1;
    while read line ; do
            ((counter=counter+1))
            var=$(sed -n -e "$counter,\$ s/$line/$line/p" file.txt)
            if [  -n "$var" ]
            then
                    echo $line
                    echo $var
            fi
    done < file.txt

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.