unix bash: separating a specific column into multiple columns

Question

I have a tab delimited file with three columns. Each of the row in the 3rd column holds a string that has 4 names, each separated from the other by space (' '), but in some cases there are more than one space separated between the names. I'd like to use a unix-bash command line to print column 1, column 2, name1, name2, name3, name4, name5, all separated by tab.

My desired output would look like this:

avov2323[tab]rogoc232[tab]Roy[tab]Don[tab]Mike[tab]Ned[tab]Lee
cdso3432[tab]fokfd543[tab]Tom[tab]Gil[tab]Rose[tab]Dan[tab]Sam

Is there a way to store all my column 3 into a variable and then split this specific variable based on spaces? something like: a=awk -F "\t" '{print $3}' file.txt;awk -F " " '{print $1}' $a;

although - this command line doesn't work for me... as all the names from column 3 get cramped to each other in $a.

please show input samples.. with and without unsignificant spaces! — F. Hauri - Give Up GitHub
– F. Hauri - Give Up GitHub, Commented Oct 8, 2014 at 18:07

Krzysztof Jabłoński · Accepted Answer · 2014-10-08 18:19:06Z

3

Use tr to translate:

tr <inputFile " " "\t" | tr -s "\t" >outputFile

Edit: As Glenn Jackman pointed out, it would be better to first squeeze spaces, then change remaining spaces to tabs.

tr <inputFile -s " " | tr " " "\t" >outputFile

It's still vulnerable to spaces in first two columns though.

edited Oct 8, 2014 at 18:19

answered Oct 8, 2014 at 18:03

Krzysztof Jabłoński

1,9411 gold badge21 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Roy Over a year ago

please note my edition to the question ( in some cases there are more than one space separated between the names)

Krzysztof Jabłoński Over a year ago

Yes, you're right. I missed that one. I've adapted my little monster.

glenn jackman Over a year ago

+1 I like this. One risk: squeezing tabs will blow up when there are empty fields. You might want to squeeze spaces first, then translate spaces to tabs

Roy Over a year ago

This one is a great solution as well. Is there a way though to use it as in a pipe? I am creating my file one step before and would like to use the resultant file in a pipe and amend its spaces & tabs as you suggest. for example, my pipe for Tom Fenech solution is this: awk 'NR==FNR{a[$2]=$0;next} ($3) in a{print $0, a[$3]}' 13PatientsInputsGQ30DP8plinkGenoLogistic.assoc.logistic /cygdrive/h/SNPs_GATK/OnlyInputsOf13Patients/SplitChromosomes/2_CarlosSNPIDsOutput/vcfWithId.vcf | awk '{$1=$1}1' OFS='\t' > UnionLogistic_vcf.txt

Krzysztof Jabłoński Over a year ago

Of course. You do not provide the <inputFile but prepend first tr with your command | . Of course your command should not write to the file then, but to pipe instead.

|

Tom Fenech · Accepted Answer · 2014-10-08 18:09:21Z

1

You could use awk:

$ cat file
avov2323        rogoc232        Roy  Don Mike  Ned Lee
cdso3432        fokfd543        Tom Gil    Rose  Dan Sam
$ awk '{$1=$1}1' OFS='\t' file
avov2323        rogoc232        Roy     Don     Mike    Ned     Lee
cdso3432        fokfd543        Tom     Gil     Rose    Dan     Sam

$1=$1 just touches each record so the new output format is applied. 1 evaluates to true, so each line is printed. Awk treats any number of whitespace characters as the input field separator, so as you can see, the number of spaces between each name is not a problem.

To overwrite the original file, you can use a temporary file:

awk '{$1=$1}1' OFS='\t' file > tmp && mv tmp file

answered Oct 8, 2014 at 18:09

Tom Fenech

75.1k13 gold badges119 silver badges154 bronze badges

3 Comments

Roy Over a year ago

thanks a million! - this one works very well for me.

Vytenis Bivainis Over a year ago

Even simplier is awk -v OFS='\t' '$1=$1'. You can check the result with echo -e avov2323\\trogoc232\\tRoy Don Mike Ned Lee | awk -v OFS='\t' '$1=$1' | od -c

Tom Fenech Over a year ago

@Vytenis the only potential downside of using $1=$1 is that when the first column evaluates to false (for example, it is "0"), the line will not be printed. For example, awk '$1=$1' <<<"0" doesn't print anything. In this case, that doesn't seem to be a problem though.

Krzysztof Jabłoński · Accepted Answer · 2014-10-09 07:05:12Z

1

Just for sake of completeness, I also wrote an awk oneliner, which won't touch any spaces in first two columns. It also preserves empty columns:

awk <inputFile -F '\t' 'BEGIN{OFS="\t"} {gsub(/ +/,OFS,$3); print $1,$2,$3}'

Edit: Regarding improvement mentioned in comment - yes, it is possible to split any column, even the middle one, though a more versatile script would be necessary. It's not a oneliner however and looks quite awkward when put in one line. I'm pretty sure it still could be somewhat optimized. With formatting:

BEGIN {
  FS=OFS="\t";
  splitAt=3;
}{
  gsub(/ +/,OFS,$splitAt);
  line=$1;
  for(i=2;i<splitAt;i++)
    line=line""OFS""$i;
  line=line""OFS""$splitAt;
  for(i=splitAt+1;i<=NF;i++)
    line=line""OFS""$i;
  print line;
}

And in charge:

awk <inputFile 'BEGIN{FS=OFS="\t"; splitAt=2;} {gsub(/ +/,OFS,$splitAt); line=$1; for(i=2;i<splitAt;i++) line=line""OFS""$i; line=line""OFS""$splitAt; for(i=splitAt+1;i<=NF;i++) line=line""OFS""$i; print line ;}'

Could be refactored to provide splitAt as a parameter to script.

edited Oct 9, 2014 at 7:05

answered Oct 8, 2014 at 18:34

Krzysztof Jabłoński

1,9411 gold badge21 silver badges30 bronze badges

2 Comments

Roy Over a year ago

Hi Krzysztof, your solution works well! - but what if I would need to split based on spaces not my 3rd column, but my 35th column - and then print all the tab delimited columns from 1 to 35, including the newly formed columns that were nested (by spaces) in the 35th column?... is there a way to incorporate this into your command - rather than tediously type at the end: ;print $1,$2,$3,$4,..etc..,$35}' ?? thanks a lot!

Roy Over a year ago

Ok, the solution for my question is actually this: awk -F '\t' 'BEGIN{OFS="\t"} {gsub(/ +/,"\t",$35); for(i=1;i<=35;i++) printf "%s",$i "\t";printf "\n"}' aaa.txt > mine.txt

Collectives™ on Stack Overflow

unix bash: separating a specific column into multiple columns

3 Answers 3

8 Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related