0

Question: I am trying to find multiple specific lines in a file (species) and then print only the 5th line after each species name to a new file. I can do this fine for each species individually, but I am having trouble making a loop to go through each of the 1000 species I have in the document. For example:

awk 'c&&!--c;/species_1$/{c=5}' results.out > speciesnames

How can I make this command into a loop so that it does the following (iterates over every species in the file):

species 1, print 5th line to document titled speciesnames

species 2, print 5th line to document titled speciesnames

species n, print 5th line to document titled speciesnames

Any help would be appreciated. I have very little experience with loops. Thanks

Data structure example from results.out:

Query= species_1

length=341
Score
bits
Line 5, relevant info
description
description
description
description
description
description
description
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
nucleotides
data
data
data
data
data
data

Query= species_2

length=341

.......

Desired output into file speciesnames:

Line 5, relevant info for species 1
Line 5, relevant info for species 2
Line 5, relevant info for species n
4
  • waht's the context of results.out? Do you have to print just 5th line of results? Commented Feb 14, 2015 at 9:45
  • The context is that each species has about 50 lines of text associated with it, but that I only need the 5th line extracted. Commented Feb 14, 2015 at 10:04
  • The "tr" command can skip lines, and it has a looping capability. Commented Feb 14, 2015 at 11:59
  • printing the words description and nucleotides 20 times to try to describe your input is not nearly as useful as showing some actual data. Show at least 3 small blocks of representative data for sample input, one for each of 3 different species and each block being 5 or 6 lines. Also, show the output you want given that input. Commented Feb 14, 2015 at 15:26

3 Answers 3

1

Meybe some like this:

awk 'c&&!--c;/species_[0-9]+$/{c=5}' file

awk '/species_[0-9]+/{a[NR+5]} {b[NR]=$0} END {for (i in a) print b[i]}' file

This prints all line 5 lines after hits of species.
Do to the nature of array in awk output is randomized.

Adjusting code after new input:

awk 'c&&!--c;/species [0-9]+$/{c=4}' file
Line 5, relevent info

You did not have _ between species and number, but one space.
You like line 4 after hit, not line 5


Example data:

cat file
Query= species 1
length=341
Score
bits
Line 5, relevent info
description
description
description
description
description
description
Query= species 5
length=341
Score
bits
Line 5, relevent info need this
description
description
description
description
description
Query= species 8
length=341
Score
bits
Line 5, relevent info more data
description
description
description
description
description
Query= species 6423
length=341
Score
bits
Line 5, relevent infom, yes here it is
description
description
description
description
description

awk 'c&&!--c {print i " --> " $0} /species [0-9]+$/{c=4;i=$2 FS $3}' file
species 1 --> Line 5, relevent info
species 5 --> Line 5, relevent info need this
species 8 --> Line 5, relevent info more data
species 6423 --> Line 5, relevent infom, yes here it is

Final solution:

awk 'c&&!--c;/species_/{c=5}' file
Sign up to request clarification or add additional context in comments.

11 Comments

I tried this, but only the first species was retrieved and printed :(
Can you give some example data, and show what goes wrong. I see this fails if there are less distance than 5 lines between different species
I have updated my question to show the data structure.
@user3237139 You need to use code tags {} to make your post readable. I still does not understand what you like to do.
Thank for adding code tags. For each species, I would like to extract the fifth line. So instead of doing: {awk 'c&&!--c;/species_1$/{c=5}' results.out > speciesnames} {awk 'c&&!--c;/species_2$/{c=5}' results.out > speciesnames} {awk 'c&&!--c;/species_3$/{c=5}' results.out > speciesnames} until I have the fifth line for all the species, I would just like to find a loop that would do this through iteration.
|
0

an approach using getline function

 awk '/^Query *= *species_[0-9]/{print $0":";for(i=1;i<=5;++i){if(getline>0 &&i==5){print}}}' file

start loop and get every 5 lines from the line that matches Query *= *species_[0-90]/

for(i=1;i<=5;++i)

Once the 5th line is reached print

{if(getline>0 &&i==5){print}}}'

example file that has

Query= species_1

length=341
Score
bits
Line 5, relevant info
description
description
data
data
data
data
data
data

Query= species_2

length=341
Score
bits
Line 5, relevant info
description
description
data
data
data
data
data
data

result

Query= species_1:
Line 5, relevant info
Query= species_2:
Line 5, relevant info

1 Comment

Would you mind tweaking that script to show how to print every input line as it's read to stderr so we can see the script reading every line of the input file for debugging? One of the reasons to avoid using getline unnecessarily is that it breaks the natural flow of awks implicit read loop and so makes what should be trivial things like that much harder to do, usually resulting in duplicated code or a complete rewrite. See awk.info/?tip/getline.
0

Could you do something like

linenr=0
species=unknown
cat results.out | while read -r line; do
   if [[ "${line}" = Query* ]]; then
      linenr=0
      species=$(echo ${line} | cut -d= -f2)
   else
      (( linenr = linenr + 1 ))
      if [ ${linenr} -eq 5 ]; then
         echo ${line} > ${species}.out
      fi
   fi
done

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.