2

could you help me with bash/awk script? I have several .dat files in the directory. All of these files consist of header and from data:

c ROIysiz= 28
c column1= HJD
c RedNumDa= 18262
c column3= ERROR
c column2= FLUX
c end header ---------------------------------------------------------------------------------------
2.458375368952875026e+06 -8.420548421860798386e-04 7.020812100561693928e-03
2.458375579737625085e+06 -5.579159672996818198e-03 1.285380720081348528e-03
2.458376278315599542e+06 -7.634101850411220518e-03 2.481065693991901019e-03
2.458376347386624664e+06 7.223482191697593166e-04 2.319993894372075760e-03
2.458376416108166799e+06 5.238757879614985152e-03 1.389030320490110878e-03
2.458376485913363751e+06 6.777606553373448882e-03 8.887787066666734273e-04
2.458377048675692175e+06 1.950435173388009522e-02 3.242344477396308117e-03
2.458377185153110884e+06 1.885754079806525874e-02 2.090836971653367571e-03

filename - Old-file I would like for all of the files:

1) to save name of the file to variable

2) to save some information from header to variables - for instance the string after "c column3= ", "c column2= " and "c ROIysiz= "

3) using the variables with saved information from the header, I would like to rename the file - for instance "FLUX28"

4) create new file

5) to print information from variables to the first row of the new file - for instance file name of the original file, the information after "c column3= ", "c column2= "

6) print data - print the part of the original file after line starting "c end header"

7) add # to the start of the first line

#!/bin/bash
for file in *.dat; do                   # loop in the directory
awk -v FILE=$FILE_NAME                  # save file name to variable FILE

/c end header/ { in_f_format=0; next }  # print file from c end header

{print $1, $2, $3}                      # print columns

BEGIN{printf("#")}1                     # adding hashtag before the first line
; done                                  # end of loop

desired output

files with names FLUX28

(in another file will be another number - file name will consists of strings from the header) in the files will be:

#Old-file ERROR FLUX
2.458375368952875026e+06 -8.420548421860798386e-04 7.020812100561693928e-03
2.458375579737625085e+06 -5.579159672996818198e-03 1.285380720081348528e-03
2.458376278315599542e+06 -7.634101850411220518e-03 2.481065693991901019e-03
2.458376347386624664e+06 7.223482191697593166e-04 2.319993894372075760e-03
2.458376416108166799e+06 5.238757879614985152e-03 1.389030320490110878e-03
2.458376485913363751e+06 6.777606553373448882e-03 8.887787066666734273e-04
2.458377048675692175e+06 1.950435173388009522e-02 3.242344477396308117e-03
2.458377185153110884e+06 1.885754079806525874e-02 2.090836971653367571e-03

Code from the discussion:

awk '
/ROIysiz/{
second_out=$NF
}
/column 3/{
third_part=$NF
}
/column2/{
close(out_file)
found=count=""
out_file=$NF second_out third_part
next
}
/end header/{
found=1
next
}
found && out_file{
if(++count==1){
print "#" $0 > (out_file)
}
else{
print > (out_file)
}
}
' input
2
  • Thank you for showing your attempts but please do add more info related to expected output as it's not clear as of now Commented Feb 25, 2020 at 5:47
  • I am sorry, I added it. Commented Feb 25, 2020 at 6:03

1 Answer 1

2

Could you please try following, not tested thoroughly.

awk '
/ROIysiz/{
  second_out=$NF
}
/column2/{
  close(out_file)
  found=count=""
  out_file=$NF second_out
  next
}
/end header/{
  found=1
  next
}
found && out_file{
  if(++count==1){
    print "#" $0 > (out_file)
  }
  else{
    print > (out_file)
  }
}
' Input_file

Explanation: Added detailed explanation for above code.

awk '                          ##Starting awk program from here.
/ROIysiz/{                     ##Checking condition if a line contains string ROIysiz then do following.
  second_out=$NF               ##Creating variable second_out for output file 2nd part.
}
/column2/{                     ##Checking condition if line contains column2 string in it.
  close(out_file)              ##Closing out_file to avoid "too many files opened" error.
  found=count=""               ##Nullifying variable found here.
  out_file=$NF second_out      ##Creating variable out_file which is having last field of current line and second_out variable value.
  next                         ##next will skip all further statements from here.
}
/end header/{                  ##Checking condition if string end header is found then do following.
  found=1                      ##Setting variable found to 1 here.
  next                         ##next will skip all further statements from here.
}
found && out_file{             ##Checking condition if found AND out_file is SET then do following.
  if(++count==1){              ##If count==1 then do following, to add # in starting of first line.
    print "#" $0 > (out_file)  ##Printing # and current line to out_file now.
  }
  else{                        ##Else if count is greater than 1 then do following.
    print > (out_file)         ##Printing current line to out_file here.
  }
}
' Input_file                   ##Mentioning Input_file name here.
Sign up to request clarification or add additional context in comments.

7 Comments

Why the name of output file is the same for out_file=$NF second_out third_part and out_file=$NF second_out? Why I can't add print "#" second_out $0 > (out_file) to this line second_out?
@Alex, Bacially I am taking your OUTPUT file name from different different lines, like 1 line has FLUX, another line has 28 and so on that is the reason second_part means 28 out_file means whole file name.
I mean the code you sent me in the discussion. I added it to my question. There is out_file=$NF second_out third_part. Does it mean a new file name composed from three words? Because I got two. The file name was the same as for out_file=$NF second_out from your answer code.
@Alex, It was for your question when you asked how to add 1 more condition and add file name to it, it is simple; add a condition create a variable if that condition is met and add into suitable place in out_file variable.
Yes. And this is in code you provided me, isn't it? So, I don't understand why I got the file name FLUX28, I expect the string 'ERROR' in file name too. There are 3 conditions: /ROIysiz/ /column 3/ /column2/ . Right?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.