1

How can I change a file which is like this:

A   25
B   26
A   14
D   39
E   42

A   74
B   36
A   81
D   96
E   17

A   23
B   14
A   74
D   87
E   17

into a file with the rows as columns, but only once, and their according values in columns? Like this:

 A   B     A   D    E
25   26   14   39  42
74   36   81   96  17
23   14   74   87  17

My columns are repeated every 29 rows and some columns, like A, have the same name.

3
  • Have you made any attempt? Commented Apr 9, 2016 at 18:07
  • The last two days is the only thing that I'm doing :) Commented Apr 9, 2016 at 18:37
  • 3
    Don't tell us you made an attempt; show us the attempt. Commented Apr 9, 2016 at 19:25

5 Answers 5

5

You can use the following awk script to transform the file:

transform.awk:

{
    # On the first record this loop runs twice. once
    # for the headers once for the first line of data.
    # In all subsequent lines is prints only the data
    # because h==1.
    for(;h<=1;h++){
        for(i=1+h;i<=NF;i+=2){
            printf "%s ", $i
        }
        printf "\n"
    }
    h=1
}

Then execute it like this:

awk -f transform.awk RS='' file

Output:

A B A D E 
25 26 14 39 42 
74 36 81 96 17 
23 14 74 87 17

To get proper aligned columns you can pipe to column -t:

awk -f transform.awk RS='' file | column -t

Output:

A   B   A   D   E
25  26  14  39  42
74  36  81  96  17
23  14  74  87  17

The key here is the usage of the variable RS (record separator). Using an empty string for RS separates records by blank lines. It is the same as setting it to \n\n+ (one or more blank lines). The first record for examples will look like this:

A   25
B   26
A   14 
D   39
E   42

awk by default splits by [[:space:]]+ which includes newlines. This gives us the following fields for record one.

A 25 B 26 A 14 D 39 E 42

The algorithm shown above transforms this fields to the desired output.

Sign up to request clarification or add additional context in comments.

6 Comments

Thank you hek2mgl for your answer but I'm getting: column: Invalid or incomplete multibyte or wide character.
@EdMorton You are right, input like \n\n\n+ works as well with RS=''!
I'm still getting the same error. Not sure if I use unicode locale. I suppose your answer is correct but I cannot verify it now on this laptop. I will try it tomorrow in another machine and I'll accept your answer. Thank you very much for your time and effort @hek2mgl
@jimakos17 You are welcome. What does echo $LANG give you?
en_US.UTF-8 @hek2mgl I connected to another machine and just reproduced it. Thank you very much again!!!
|
2

alternative to awk solution with other unix toolset (used extensively)

$ sed '/^$/d' file    | 
  pr -3ts' '          | 
  tr '\t' ' '         | 
  tr -s ' '           | 
  cut -d' ' -f1,2,4,6 | 
  tr ' ' '\n'         | 
  pr -5ts' '          |
  column -t



A   B   A   D   E
25  26  14  39  42
74  36  81  96  17
23  14  74  87  17

first magic number 3 is number of repeated sections (or number of rows without header) and second magic number 5 is number of items in each section (or number of columns)

Comments

1

For fun, some opaque, perl-ish ruby:

ruby -00 -lane '
    headers, values = $F.each_with_index.partition {|(v,i)| i.even?}
    puts headers.collect(&:first).join(" ") if $. == 1
    puts values.collect(&:first).join(" ")
' file

1 Comment

Cool! nice solution.
0

And just to round out the mix, you can do it in a fairly flexible manner with a simple script (limited to reading 2-column files formatted as your input file is shown) It will read the data from a filename given as the first argument (or from stdin by default).

The script simply reads column-1 and column-2 into separate indexed arrays (a1 & a2) until a blank line is encountered, and, if it is the first time through, prints the heading row (and sets the heading flag h to not print again), followed by printing the data in a2.

When the end of the file is reached is simply prints the final row of data.

#!/bin/bash

fname="${1:-/dev/stdin}"

declare -i h=0
declare -a a1
declare -a a2

while read -r line; do
    if [ "$line" != "" ]; then
        a1+=( ${line%% *} )
        a2+=( ${line##* } )
    else 
        [ "$h" -eq 0 ] && { printf " %2s" ${a1[@]}; echo ""; h=1; }
        printf " %2s" ${a2[@]}
        echo ""
        unset a1; unset a2;
    fi
done < "$fname"

printf " %2s" ${a2[@]}
echo ""

Use/Output

$ bash r2c.sh dat/r2c.txt
  A  B  A  D  E
 25 26 14 39 42
 74 36 81 96 17
 23 14 74 87 17

Comments

0

Or a litle bit more reg-exp oriented:

perl -0pE  'say s/\s*\d+\h*\n|\n.*/ /sgr;  s/(^|\n)\w\s*/ /g' file

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.