2

I have a dataset in a single column that I would like to split into any number of new columns when a certain string is found (in this case 'male_position'.

>cat test.file

male_position
0.00
0.00
1.05
1.05
1.05
1.05
3.1
5.11
12.74
30.33
40.37
40.37
male_position
0.00
1.05 
2.2
4.0
4.0
8.2
25.2
30.1
male_position
1.0
5.0

I would like the script to produce new tab separated columns each time 'male_position' is encountered but just print each each line/data point below that (added to that column) until the next occurrence of 'male_position':

script.awk test.file > output

0.00  0.00  1.0
0.00  1.05  5.0
1.05  2.2
1.05  4.0
1.05  4.0
1.05  8.2
3.1  25.2
5.11 30.1
12.74
30.33
40.37
40.37

Any ideas?

update - I have tried to adapt code based on this post(Linux split a column into two different columns in a same CSV file)

cat script.awk

BEGIN {
   line = 0; #Initialize at zero
}
/male_position/ { #every time we hit the delimiter
   line = 0; #resed line to zero
}
!/male_position/{ #otherwise
   a[line] = a[line]" "$0; # Add the new input line to the output line
   line++; # increase the counter by one
}
END {
   for (i in a )
      print a[i] # print the output
}

Results....

$ awk -f script.awk test.file
 1.05 2.2
 1.05 4.0
 1.05 4.0
 1.05 8.2
 3.1 25.2
 5.11 30.1
 12.74
 30.33
 40.37
 40.37
 0.00 0.00 1.0
 0.00 1.05  5.0

UPDATE 2 #######

I can recreate the expected with the test.file case. Running the script (script.awk) on Linux with test file and 'awk.script"(see above) seemed to work. However, that simple example file has only decreasing numbers of columns (data points) between the delimiter (male_position). When you increase the number of columns between, the output seems to fail...

cat test.file2

male_position
0.00
0.00
1.05
1.05
1.05
1.05
3.1
5.11
12.74
male_position
0
5
10
male_position
0
1
2
3
5

awk -f script.awk test.file2

0.00 0 0
0.00 5 1
1.05 10 2
1.05 3
1.05 5
1.05 
3.1
5.11
12.74

there is no 'padding' of the lines after the the last observation for a given column, so a column with more values than the predeeding column has its values fall in line with the previous column ( the 3 and the 5 are in column 2, when they should be in column 3).

7
  • I have tried adapting some code from a previous post:stackoverflow.com/questions/14709360/…, but could not recover the expecte output in to different columns based on the delimiter used (male_position). Commented May 14, 2018 at 2:06
  • In this code replace ,, with male_position? Commented May 14, 2018 at 2:21
  • yes, I have tried that, but it does not quite work... the first 2 lines after a 'male_position' are put at the end of the column, not the beginning as I would expecte them. Commented May 14, 2018 at 2:26
  • or try the accepted answer. Commented May 14, 2018 at 2:29
  • again, I have tried adapting the code from the 'accepted answer' exactly, but don t get the expected result. we instead get some interesting behavior of the first two lines after the delimiter match go to the end of the 'group' - perhaps the lines with '0.0' are throwing it off (the third 'group' with 1.0 and 5.0 seemed to work just fine, but the first two did not... Commented May 14, 2018 at 2:44

2 Answers 2

1

Here's a csplit+paste solution

$ csplit --suppress-matched -zs test.file2 /male_position/ {*}
$ ls
test.file2  xx00  xx01  xx02
$ paste xx*
0.00    0   0
0.00    5   1
1.05    10  2
1.05        3
1.05        5
1.05        
3.1     
5.11        
12.74       

From man csplit

csplit - split a file into sections determined by context lines

-z, --elide-empty-files remove empty output files

-s, --quiet, --silent do not print counts of output file sizes

--suppress-matched suppress the lines matching PATTERN

  • /male_position/ is the regex used to split the input file
  • {*} specifies to create as many splits as possible
  • use -f and -n options to change the default output file names
  • paste xx* to paste the files column wise, TAB is default separator
Sign up to request clarification or add additional context in comments.

Comments

1

Following awk may help you on same.

awk '/male_position/{count++;max=val>max?val:max;val=1;next} {array[val++,count]=$0} END{for(i=1;i<=max;i++){for(j=1;j<=count;j++){printf("%s%s",array[i,j],j==count?ORS:OFS)}}}' OFS="\t"   Input_file

Adding a non-one liner form of solution too now.

awk '
/male_position/{
  count++;
  max=val>max?val:max;
  val=1;
  next}
{
  array[val++,count]=$0
}
END{
  for(i=1;i<=max;i++){
      for(j=1;j<=count;j++){   printf("%s%s",array[i,j],j==count?ORS:OFS)   }}
}
' OFS="\t"   Input_file

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.