Split column into multiple based on match/delimiter using bash awk

Question

I have a dataset in a single column that I would like to split into any number of new columns when a certain string is found (in this case 'male_position'.

>cat test.file

male_position
0.00
0.00
1.05
1.05
1.05
1.05
3.1
5.11
12.74
30.33
40.37
40.37
male_position
0.00
1.05 
2.2
4.0
4.0
8.2
25.2
30.1
male_position
1.0
5.0

I would like the script to produce new tab separated columns each time 'male_position' is encountered but just print each each line/data point below that (added to that column) until the next occurrence of 'male_position':

script.awk test.file > output

0.00  0.00  1.0
0.00  1.05  5.0
1.05  2.2
1.05  4.0
1.05  4.0
1.05  8.2
3.1  25.2
5.11 30.1
12.74
30.33
40.37
40.37

Any ideas?

update - I have tried to adapt code based on this post(Linux split a column into two different columns in a same CSV file)

cat script.awk

BEGIN {
   line = 0; #Initialize at zero
}
/male_position/ { #every time we hit the delimiter
   line = 0; #resed line to zero
}
!/male_position/{ #otherwise
   a[line] = a[line]" "$0; # Add the new input line to the output line
   line++; # increase the counter by one
}
END {
   for (i in a )
      print a[i] # print the output
}

Results....

$ awk -f script.awk test.file
 1.05 2.2
 1.05 4.0
 1.05 4.0
 1.05 8.2
 3.1 25.2
 5.11 30.1
 12.74
 30.33
 40.37
 40.37
 0.00 0.00 1.0
 0.00 1.05  5.0

UPDATE 2 #######

I can recreate the expected with the test.file case. Running the script (script.awk) on Linux with test file and 'awk.script"(see above) seemed to work. However, that simple example file has only decreasing numbers of columns (data points) between the delimiter (male_position). When you increase the number of columns between, the output seems to fail...

cat test.file2

male_position
0.00
0.00
1.05
1.05
1.05
1.05
3.1
5.11
12.74
male_position
0
5
10
male_position
0
1
2
3
5

awk -f script.awk test.file2

0.00 0 0
0.00 5 1
1.05 10 2
1.05 3
1.05 5
1.05 
3.1
5.11
12.74

there is no 'padding' of the lines after the the last observation for a given column, so a column with more values than the predeeding column has its values fall in line with the previous column ( the 3 and the 5 are in column 2, when they should be in column 3).

I have tried adapting some code from a previous post:stackoverflow.com/questions/14709360/…, but could not recover the expecte output in to different columns based on the delimiter used (male_position). — user95146
– user95146, Commented May 14, 2018 at 2:06
yes, I have tried that, but it does not quite work... the first 2 lines after a 'male_position' are put at the end of the column, not the beginning as I would expecte them. — user95146
– user95146, Commented May 14, 2018 at 2:26
again, I have tried adapting the code from the 'accepted answer' exactly, but don t get the expected result. we instead get some interesting behavior of the first two lines after the delimiter match go to the end of the 'group' - perhaps the lines with '0.0' are throwing it off (the third 'group' with 1.0 and 5.0 seemed to work just fine, but the first two did not... — user95146
– user95146, Commented May 14, 2018 at 2:44

Sundeep · Accepted Answer · 2018-05-14 03:57:43Z

1

Here's a csplit+paste solution

$ csplit --suppress-matched -zs test.file2 /male_position/ {*}
$ ls
test.file2  xx00  xx01  xx02
$ paste xx*
0.00    0   0
0.00    5   1
1.05    10  2
1.05        3
1.05        5
1.05        
3.1     
5.11        
12.74

From man csplit

csplit - split a file into sections determined by context lines

-z, --elide-empty-files remove empty output files

-s, --quiet, --silent do not print counts of output file sizes

--suppress-matched suppress the lines matching PATTERN

/male_position/ is the regex used to split the input file
{*} specifies to create as many splits as possible
use -f and -n options to change the default output file names
paste xx* to paste the files column wise, TAB is default separator

answered May 14, 2018 at 3:57

Sundeep

23.9k2 gold badges35 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

RavinderSingh13 · Accepted Answer · 2018-05-14 05:29:03Z

1

Following awk may help you on same.

awk '/male_position/{count++;max=val>max?val:max;val=1;next} {array[val++,count]=$0} END{for(i=1;i<=max;i++){for(j=1;j<=count;j++){printf("%s%s",array[i,j],j==count?ORS:OFS)}}}' OFS="\t"   Input_file

Adding a non-one liner form of solution too now.

awk '
/male_position/{
  count++;
  max=val>max?val:max;
  val=1;
  next}
{
  array[val++,count]=$0
}
END{
  for(i=1;i<=max;i++){
      for(j=1;j<=count;j++){   printf("%s%s",array[i,j],j==count?ORS:OFS)   }}
}
' OFS="\t"   Input_file

answered May 14, 2018 at 5:29

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Collectives™ on Stack Overflow

Split column into multiple based on match/delimiter using bash awk

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related