2

I have a file that looks like this:

1   10000   10400   GI.STMC.GAST-EnhA
1   10000   10400   SKIN.PEN.FRSK.FIB.02-EnhA 
1   10000   10400   BRN.DL.PRFRNTL.CRTX-EnhA
1   10000   10400   BRN.ANT.CAUD-EnhA
1   10000   10400   HRT.ATR.R-EnhA 
1   10200   10400   ESDR.H1.MSC-EnhA
1   10200   10400   GI.ESO-EnhA
1   10200   10400   GI.DUO.SM.MUS-EnhA
1   10200   10400   LNG-EnhA
1   14800   15200   MUS.TRNK.FET-EnhA

I want to split the files based on the annotations in the 4th column, I can extract the unique annotations with the following code:

sort -u file.list > annotation.list # file.list file with the different annotations

And I can store the information in annotation in an array with:

 mapfile -t myARRAY < annotation.list,

However I don't know how to split the file in different files containing only annotation, an example would be including the annotation "ADRL.GLND.FET-TssA":

1   713800  714800  ADRL.GLND.FET-TssA
1   762000  763200  ADRL.GLND.FET-TssA 
1   948600  948800  ADRL.GLND.FET-TssA
1   1166800 1167400 ADRL.GLND.FET-TssA
1   1208600 1208800 ADRL.GLND.FET-TssA
1   1243400 1243800 ADRL.GLND.FET-TssA
1   1244000 1244200 ADRL.GLND.FET-TssA
1   1284000 1284400 ADRL.GLND.FET-TssA
1   1310200 1310400 ADRL.GLND.FET-TssA
1   1310800 1311200 ADRL.GLND.FET-TssA

I might grep the unique annotations and output it to a file, but I am sure that could be a most elegant way,

Thanks

EDIT: so far i have this

mapfile -t myARRAY < annotation.list;
for ann in ${myARRAY}; do
     grep ${ann} roadmap.core_active.bed > ${ann}.annotation
done

However, I only got an annotation made

1
  • Fix your loop as follows: for ann in "${myARRAY[@]}"; do ... Commented Sep 1, 2016 at 18:56

1 Answer 1

1

This will create each of the annotation files that you ask for:

awk '{print >$4".annotation"}' file.list

Awk implicitly reads through a file line-by-line. Here, we use a print statement with its output re-directed to a file whose name is made up of the fourth field with the suffix .annotation added.

The above will work unless there are a very large number of different annotations. In that case, you may hit your system limit for open files. To avoid that, we need to close files explicitly:

awk '{fname=$4".annotation"; print>fname; close(fname)}' file.list
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.