I'am currently working on a script and I have a problem formatting the output. The index and input Files look like:
index
Pseudopropionibacterium propionicum
Kibdelosporangium phytohabitans
Steroidobacter denitrificans
File 1
Pseudopropionibacterium propionicum 1591.0
Kibdelosporangium phytohabitans 907.0
Olsenella sp. oral taxon 807 7323.0 oral bacterium
Steroidobacter denitrificans 6673.0 sludge bacterium
File 2
Pseudopropionibacterium propionicum 123.0
Caulobacteraceae bacterium OTSz_A_272 1019.0
Saccharopolyspora erythraea 939.0 soil bacterium
Rhodopseudomonas palustris 900.0
Nitrospira moscoviensis 856.0 soil/water bacterium
File 3
Pseudopropionibacterium propionicum 1591.0
Kibdelosporangium phytohabitans 907.0
Verrucosispora maris 391.0 deep-sea actinomycete
Tannerella forsythia 389.0 periodontal pathogen
Actinoplanes missouriensis 376.0 soil bacterium
what the script does is looking with the help of the index for a match in File 2 and prints out field one and two of File 2. However this is done for more than one File 2 (the all look the same) and I wanted to create a new column for the output of each new File 2.
My Code until now:
#!/bin/bash
for file in ./*_TOP1000
do
basename $file >> output
awk 'BEGIN{FS="\t"}NR==FNR{a[$1]=$0;next}$1 in a{print $1,$2}' index $file >> output
done
And the output looks like:
File 1
Pseudopropionibacterium propionicum 1591.0
Kibdelosporangium phytohabitans 907.0
Steroidobacter denitrificans 6673.0
File 2
Pseudopropionibacterium propionicum 4326.0
File 3
Kibdelosporangium phytohabitans 1591.0
Pseudopropionibacterium propionicum 907.0
But it would like to have in in that way:
File 1 File 2 File 3
Pseudopropionibacterium propionicum 1591.0 Pseudopropionibacterium propionicum 4326.0 Pseudopropionibacterium propionicum 907.0
Kibdelosporangium phytohabitans 907.0 Kibdelosporangium phytohabitans 1591.0
Steroidobacter denitrificans 6673.0
with the matching results directly under them. All the files could have different matches.
I tried solving it with the column command sneaking in separator but it was not working. So how can I archive the desired output?
file2? containing same lines?Pseudopropionibacterium propionicum 1591.0does not appear infile2awk '{print $1} | wc -Lto get the longest element of each column, and use this longest size to display your elements with printf %s, let’s say your first column is 20-chars long at most, useprintf %-20s.