I have a file like this and i want to split the file in to multiple files based on a pattern. Each block has some information of a (Job Number =) with the first line having its parent information like this %HOSTNAME#PARENT_UNIQUE_ID_xxxxxx.JOB_NAME
I want extract the lines between %HOSTNAME#PARENT_UNIQUE_ID_xxxxxx.JOB_NAME including the line %HOSTNAME#PARENT_UNIQUE_ID_xxxxxx.JOB_NAME.
Here is what i'm doing, this is splitting files as needed like below ..
HOSTNAME#PARENT_UNIQUE_ID_000001.JOB_NAME_jobProperties.txt
HOSTNAME#PARENT_UNIQUE_ID_000002.JOB_NAME_jobProperties.txt
code
while IFS= read line ; do
if [[ $line =~ "%sj" ]]; then
job_prop_objct_name=$(echo $line | grep -o -P '(?<= ).*')
echo $line > $job_prop_objct_name"_jobProperties.txt"
else
echo $line >> $job_prop_objct_name"_jobProperties.txt"
fi
done < $1
But the problem is, in the text file sometimes there are multiple jobs (Job Number =), Example last two block in my text sample posted and my code is combining these in to one file.
What i would like is to split these blocks as well in to different files may be adding the job number to the file.
Text File
%sj HOSTNAME#PARENT_UNIQUE_ID_000001.JOB_NAME
General Information
Job = JOB_NAME
Workstation = HOSTNAME
Scheduled Time = 01/06/2018 06:00 TZ CST
Runtime Information
Status = Successful
Job Number = 12345
Time Information
Maximum Duration =
Extra Information
-
%sj HOSTNAME#PARENT_UNIQUE_ID_000002.JOB_NAME
General Information
Job = JOB_NAME
Workstation = HOSTNAME
Scheduled Time = 01/06/2018 06:00 TZ CST
Runtime Information
Status = Successful
Job Number = 12346
Time Information
Maximum Duration =
Extra Information
-
%sj HOSTNAME#PARENT_UNIQUE_ID_000003.JOB_NAME
General Information
Job = JOB_NAME
Workstation = HOSTNAME
Scheduled Time = 01/06/2018 06:00 TZ CST
Runtime Information
Status = Successful
Job Number = 12347
Time Information
Maximum Duration =
Extra Information
-
General Information
Job = JOB_NAME
Workstation = HOSTNAME
Scheduled Time = 01/06/2018 06:00 TZ CST
Runtime Information
Status = Successful
Job Number = 12348
Time Information
Maximum Duration =
Extra Information
-
The resultant files currently are looking like this..
HOSTNAME#PARENT_UNIQUE_ID_000001.JOB_NAME.txt
%sj HOSTNAME#PARENT_UNIQUE_ID_000001.JOB_NAME
General Information
Job = JOB_NAME
Workstation = HOSTNAME
Scheduled Time = 01/06/2018 06:00 TZ CST
Runtime Information
Status = Successful
Job Number = 12345
Time Information
Maximum Duration =
Extra Information
-
HOSTNAME#PARENT_UNIQUE_ID_000002.JOB_NAME.txt
%sj HOSTNAME#PARENT_UNIQUE_ID_000002.JOB_NAME
General Information
Job = JOB_NAME
Workstation = HOSTNAME
Scheduled Time = 01/06/2018 06:00 TZ CST
Runtime Information
Status = Successful
Job Number = 12346
Time Information
Maximum Duration =
Extra Information
-
HOSTNAME#PARENT_UNIQUE_ID_000003.JOB_NAME.txt
%sj HOSTNAME#PARENT_UNIQUE_ID_000003.JOB_NAME
General Information
Job = JOB_NAME
Workstation = HOSTNAME
Scheduled Time = 01/06/2018 06:00 TZ CST
Runtime Information
Status = Successful
Job Number = 12347
Time Information
Maximum Duration =
Extra Information
-
General Information
Job = JOB_NAME
Workstation = HOSTNAME
Scheduled Time = 01/06/2018 06:00 TZ CST
Runtime Information
Status = Successful
Job Number = 12348
Time Information
Maximum Duration =
Extra Information
-
I want the file HOSTNAME#PARENT_UNIQUE_ID_000003.JOB_NAME.txt to split in to multiple files depending on the job numbers it has like this in this example ..
HOSTNAME#PARENT_UNIQUE_ID_000003.JOB_NAME_12347.txt
%sj HOSTNAME#PARENT_UNIQUE_ID_000003.JOB_NAME
General Information
Job = JOB_NAME
Workstation = HOSTNAME
Scheduled Time = 01/06/2018 06:00 TZ CST
Runtime Information
Status = Successful
Job Number = 12347
Time Information
Maximum Duration =
Extra Information
-
HOSTNAME#PARENT_UNIQUE_ID_000003.JOB_NAME_12348.txt
%sj HOSTNAME#PARENT_UNIQUE_ID_000003.JOB_NAME
General Information
Job = JOB_NAME
Workstation = HOSTNAME
Scheduled Time = 01/06/2018 06:00 TZ CST
Runtime Information
Status = Successful
Job Number = 12348
Time Information
Maximum Duration =
Extra Information
-
UPDATE:- Workaround, although not a complete solution.
.
This is closest I could get as a workaround with a caveat, and i'm sure it is the ugly way.
split_JobPropsFile () {
counter=1
while IFS= read line ; do
if [[ $line =~ "%sj" ]]; then
job_prop_objct_name=$(echo $line | grep -o -P '(?<= ).*')
echo $line > $job_prop_objct_name"_"$counter"_jobProperties.txt"
else
echo $line >> $job_prop_objct_name"_"$counter"_jobProperties.txt"
if [[ $line =~ "-" ]]; then
((counter++))
#echo "End of Block"
echo "%sj" $job_prop_objct_name >> $job_prop_objct_name"_"$counter"_jobProperties.txt"
fi
fi
done < $1
}
The above code is doing what i'm expecting. Except, it is creating one extra file at the end of loop with just the "%sj" line.
Of course, it is probably not an intelligent way to achieve this and it is also time consuming when my input file is large and other issues i'm probably not aware of like open files etc ...
Can this be done using awk addressing the caveat of the extra file it is creating with this workaround ?