Awk - saving strings from file to variables

Question

could you help me with bash/awk script? I have several .dat files in the directory. All of these files consist of header and from data:

c ROIysiz= 28
c column1= HJD
c RedNumDa= 18262
c column3= ERROR
c column2= FLUX
c end header ---------------------------------------------------------------------------------------
2.458375368952875026e+06 -8.420548421860798386e-04 7.020812100561693928e-03
2.458375579737625085e+06 -5.579159672996818198e-03 1.285380720081348528e-03
2.458376278315599542e+06 -7.634101850411220518e-03 2.481065693991901019e-03
2.458376347386624664e+06 7.223482191697593166e-04 2.319993894372075760e-03
2.458376416108166799e+06 5.238757879614985152e-03 1.389030320490110878e-03
2.458376485913363751e+06 6.777606553373448882e-03 8.887787066666734273e-04
2.458377048675692175e+06 1.950435173388009522e-02 3.242344477396308117e-03
2.458377185153110884e+06 1.885754079806525874e-02 2.090836971653367571e-03

filename - Old-file I would like for all of the files:

1) to save name of the file to variable

2) to save some information from header to variables - for instance the string after "c column3= ", "c column2= " and "c ROIysiz= "

3) using the variables with saved information from the header, I would like to rename the file - for instance "FLUX28"

4) create new file

5) to print information from variables to the first row of the new file - for instance file name of the original file, the information after "c column3= ", "c column2= "

6) print data - print the part of the original file after line starting "c end header"

7) add # to the start of the first line

#!/bin/bash
for file in *.dat; do                   # loop in the directory
awk -v FILE=$FILE_NAME                  # save file name to variable FILE

/c end header/ { in_f_format=0; next }  # print file from c end header

{print $1, $2, $3}                      # print columns

BEGIN{printf("#")}1                     # adding hashtag before the first line
; done                                  # end of loop

desired output

files with names FLUX28

(in another file will be another number - file name will consists of strings from the header) in the files will be:

#Old-file ERROR FLUX
2.458375368952875026e+06 -8.420548421860798386e-04 7.020812100561693928e-03
2.458375579737625085e+06 -5.579159672996818198e-03 1.285380720081348528e-03
2.458376278315599542e+06 -7.634101850411220518e-03 2.481065693991901019e-03
2.458376347386624664e+06 7.223482191697593166e-04 2.319993894372075760e-03
2.458376416108166799e+06 5.238757879614985152e-03 1.389030320490110878e-03
2.458376485913363751e+06 6.777606553373448882e-03 8.887787066666734273e-04
2.458377048675692175e+06 1.950435173388009522e-02 3.242344477396308117e-03
2.458377185153110884e+06 1.885754079806525874e-02 2.090836971653367571e-03

Code from the discussion:

awk '
/ROIysiz/{
second_out=$NF
}
/column 3/{
third_part=$NF
}
/column2/{
close(out_file)
found=count=""
out_file=$NF second_out third_part
next
}
/end header/{
found=1
next
}
found && out_file{
if(++count==1){
print "#" $0 > (out_file)
}
else{
print > (out_file)
}
}
' input

Thank you for showing your attempts but please do add more info related to expected output as it's not clear as of now — RavinderSingh13
– RavinderSingh13, Commented Feb 25, 2020 at 5:47

RavinderSingh13 · Accepted Answer · 2020-02-25 06:48:47Z

2

Could you please try following, not tested thoroughly.

awk '
/ROIysiz/{
  second_out=$NF
}
/column2/{
  close(out_file)
  found=count=""
  out_file=$NF second_out
  next
}
/end header/{
  found=1
  next
}
found && out_file{
  if(++count==1){
    print "#" $0 > (out_file)
  }
  else{
    print > (out_file)
  }
}
' Input_file

Explanation: Added detailed explanation for above code.

awk '                          ##Starting awk program from here.
/ROIysiz/{                     ##Checking condition if a line contains string ROIysiz then do following.
  second_out=$NF               ##Creating variable second_out for output file 2nd part.
}
/column2/{                     ##Checking condition if line contains column2 string in it.
  close(out_file)              ##Closing out_file to avoid "too many files opened" error.
  found=count=""               ##Nullifying variable found here.
  out_file=$NF second_out      ##Creating variable out_file which is having last field of current line and second_out variable value.
  next                         ##next will skip all further statements from here.
}
/end header/{                  ##Checking condition if string end header is found then do following.
  found=1                      ##Setting variable found to 1 here.
  next                         ##next will skip all further statements from here.
}
found && out_file{             ##Checking condition if found AND out_file is SET then do following.
  if(++count==1){              ##If count==1 then do following, to add # in starting of first line.
    print "#" $0 > (out_file)  ##Printing # and current line to out_file now.
  }
  else{                        ##Else if count is greater than 1 then do following.
    print > (out_file)         ##Printing current line to out_file here.
  }
}
' Input_file                   ##Mentioning Input_file name here.

edited Feb 25, 2020 at 6:48

answered Feb 25, 2020 at 6:08

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Alex Over a year ago

Why the name of output file is the same for out_file=$NF second_out third_part and out_file=$NF second_out? Why I can't add print "#" second_out $0 > (out_file) to this line second_out?

RavinderSingh13 Over a year ago

@Alex, Bacially I am taking your OUTPUT file name from different different lines, like 1 line has FLUX, another line has 28 and so on that is the reason second_part means 28 out_file means whole file name.

Alex Over a year ago

I mean the code you sent me in the discussion. I added it to my question. There is out_file=$NF second_out third_part. Does it mean a new file name composed from three words? Because I got two. The file name was the same as for out_file=$NF second_out from your answer code.

RavinderSingh13 Over a year ago

@Alex, It was for your question when you asked how to add 1 more condition and add file name to it, it is simple; add a condition create a variable if that condition is met and add into suitable place in out_file variable.

Alex Over a year ago

Yes. And this is in code you provided me, isn't it? So, I don't understand why I got the file name FLUX28, I expect the string 'ERROR' in file name too. There are 3 conditions: /ROIysiz/ /column 3/ /column2/ . Right?

|

Collectives™ on Stack Overflow

Awk - saving strings from file to variables

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related