1

I have this input file

gb|KY798440.1|
gb|KY842329.1|
MG082893.1
MG173246.1

and I want to get all the characters that are between the "|" or the full line if there is no "|". That is a desired output that looks like

KY798440.1
KY842329.1
MG082893.1
MG173246.1

I wrote:

while IFS= read -r line; do
    if [[ $line == *\|* ]] ; then
    sed 's/.*\|\(.*\)\|.*/\1/' <<< $line >> output_file
    else echo $line >> output_file
     fi
done < input_file

Which gives me

empty line
empty line
MG082893.1
MG173246.1

(note: empty line means an actual empty line - it doesn't actually writes "empty line")

The sed command works on a single example (i.e. sed 's/.*\|\(.*\)\|.*/\1/' <<< "gb|KY842329.1|" outputs KY842329.1) but within the loop it just does a line return. The else echo $line >> output_file seems to work.

1
  • 1
    Why use the while/read loop instead of just letting sed process the file? Commented Oct 21, 2020 at 22:05

2 Answers 2

2

Bare sed:

$ sed 's/^[^|]*|\||[^|]*$//g' file

Output:

KY798440.1
KY842329.1
MG082893.1
MG173246.1
Sign up to request clarification or add additional context in comments.

1 Comment

Oh that's embarrassing. I thought I tried that... Thanks though!!
0

You could do

sed '/|/s/[^|]*|\([^|]*\)|.*/\1/' input

or

awk 'NF>1 {print $2} NF < 2 { print $1}' FS=\| input

or

sed -e 's/[^|]*|//' -e 's/|.*//' input

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.