Bash: split file into multiple files based on certain string and save new filenames into array

Question

I have a file containing the following content:

(Item)
(Values)
blabla
blabla
(StopValues)
(Item)
(Values)
hello
hello
(StopValues)

I'd like to split it into multiple files so that one file always has the content from (Item) to (StopValues) (including both of these tags). Also, as I have to further use those files and use mktemp, I'd like to save each filename in an array when creating it.

To split them I used an approach with awk:

  awk '/(StopValues)/{n++}{print >"out" n ".txt" }' mainfile.txt

First problem here, when providing 'one set' of data, I still get 2 new txt files, one containing just (StopValues) tag, the other one missing just this tag.

Second problem, I'd like to create files with mktemp instead of naming them myself and I need them in an array, how would I dynamically make new ones in the awk loop and save their name into an array?

thats not defined, I call my script like this cat *.txt | ./script and all the content from cat is written in one file. When I pipe the cat of all files like that, would there be a possibility to see which content came from which "cat(ed) file", respectively split it and directly get it as array? Because if thats possible all I try to do is not needed anymore — RandomDisplayName
– RandomDisplayName, Commented Feb 28, 2017 at 14:06

Kent · Accepted Answer · 2017-02-28 14:36:46Z

First of all, the command:

arr=($(awk 'BEGIN{cmd="mktemp -u"; cmd|getline tmp}
{print > tmp}/\(StopValues/{a[++i]=tmp;close(cmd);close(tmp); cmd|getline tmp;}END{for(i=1;i<=length(a);i++)print a[i]; }' inputFile ))

The awk part:

awk 'BEGIN{cmd="mktemp -u"; cmd|getline tmp}
     {print > tmp}
     /\(StopValues/{a[++i]=tmp
                    close(cmd)
                    close(tmp)
                    cmd|getline tmp} 
    END{for(i=1;i<=length(a);i++)print a[i]; }' inputFile

With this inputFile (f): (I added the third block)

kent$  cat f
(Item)
(Values)
blabla
blabla
(StopValues)
(Item)
(Values)
hello
hello
(StopValues)
(Item)
(Values)
hello
hello
(StopValues)

The awk will output:

#The filenames can be different.
/tmp/tmp.DRaLMsXROR
/tmp/tmp.yUL6GO4xtv
/tmp/tmp.Kb0UxsHVno

So you can see the output have 3 temp files. Each file contains a block of input file.

The outputs have the temp file names, which we put in the bash array declaration, thus we have them in array. So put all together, we do a test:(here I just check the first block/tempfile):

kent$  arr=($(awk 'BEGIN{cmd="mktemp -u"; cmd|getline tmp}                                                                                                                  
    {print > tmp}/\(StopValues/{a[++i]=tmp;close(cmd);close(tmp); cmd|getline tmp;}END{for(i=1;i<=length(a);i++)print a[i]; }' f ))

kent$  echo ${arr[*]}                                                                                                                                                       
/tmp/tmp.fcf7ac0eVl /tmp/tmp.Rjru5psFQB /tmp/tmp.ldaBWCucNg

kent$  echo ${arr[1]}
/tmp/tmp.fcf7ac0eVl

kent$  cat $(echo ${arr[1]})                              
(Item)
(Values)
blabla
blabla
(StopValues)

RavinderSingh13 · Accepted Answer · 2017-02-28 14:00:55Z

0

@Try:

awk '/(Item)/{A=1;count++} A{VAL=VAL?VAL ORS $0:$0} /(StopValues)/{A="";print VAL > "out" count ".txt";VAL=""}'   Input_file

Will create 2 files out2.txt and out1.txt.

EDIT: Adding a non-one liner form of solution too now.

awk '/(Item)/{
                A=1;
                count++
             }
    A        {
                VAL=VAL?VAL ORS $0:$0
             }
    /(StopValues)/{
                        A="";
                        print VAL > "out" count ".txt";
                        VAL=""
                  }
    '    Input_file

edited Feb 28, 2017 at 14:00

answered Feb 28, 2017 at 13:54

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Collectives™ on Stack Overflow

Bash: split file into multiple files based on certain string and save new filenames into array

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related