0

I have a file called file.txt that contains the following:

123
223
Lane,id,s_id_sample_id
1,3_range.single_try,N76
2,44_range.single_try,N77
3,92_out_range.double_try,N79

I like to loop through this file and do the following:

begin from line after 'Lane' then split using comma and take the second column (id) then take the id column and split on underscore, then search and replace all dots and underscores with 'X' EXCEPT THE LAST TWO UNDERSCORES. So do not search and replace the last underscore (e.g. double_try).

So will like to end up with:

123
223
Lane,id,s_id_sample_id
1,3Xrange_single_try,N76
2,44Xrange_single_try,N77
3,92XoutXrange_double_try,N79

This is what I have done:

while IFS=',' read -r f1 f2; do
 sed -e 's/_/X/g;s/\./X/g;s/'
 echo "$f1,$f2"
 done < "$file" > output
mv output $file

The problem is how can I specify to ignore the last two underscores?

0

1 Answer 1

1

This works by first replacing the last two dots or underscores with '@', then replacing the remaining dots and underscores with 'X', and finally, replacing all the '@' characters with underscores:

IFS=','
while read -r f1 f2 f3; do 
  f2=$(sed 's/[._]\([^._]\+\)[._]\([^._]\+\)$/@\1@\2/;s/[._]/X/g;s/@/_/g' <<< "$f2")
  echo -n "$f1"
  [[ -n $f2 ]] && echo -n ",$f2"
  [[ -n $f3 ]] && echo -n ",$f3"
  echo
done < "$file" > output
mv output "$file"

If '@' is likely to occur in your input data, you may want to use a different character. Anything that you can be reasonably sure won't occur in your input will do.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.