1

I have a text file:

ifile.txt
x       y       z       t              value
1       1       5       01hr01Jan2018   3
1       1       5       02hr01Jan2018   3.1
1       1       5       03hr01Jan2018   3.2
1       3.4     3       01hr01Jan2018   4.1
1       3.4     3       02hr01Jan2018   6.1
1       3.4     3       03hr01Jan2018   1.1
1       4.2     6       01hr01Jan2018   6.33
1       4.2     6       02hr01Jan2018   8.33
1       4.2     6       03hr01Jan2018   5.33
3.4     1       2       01hr01Jan2018   3.5
3.4     1       2       02hr01Jan2018   5.65
3.4     1       2       03hr01Jan2018   3.66
3.4     3.4     4       01hr01Jan2018   6.32
3.4     3.4     4       02hr01Jan2018   9.32
3.4     3.4     4       03hr01Jan2018   12.32
3.4     4.2     8.1     01hr01Jan2018   7.43
3.4     4.2     8.1     02hr01Jan2018   7.93
3.4     4.2     8.1     03hr01Jan2018   5.43
4.2     1       3.4     01hr01Jan2018   6.12
4.2     1       3.4     02hr01Jan2018   7.15
4.2     1       3.4     03hr01Jan2018   9.12
4.2     3.4     5.5     01hr01Jan2018   2.2
4.2     3.4     5.5     02hr01Jan2018   3.42
4.2     3.4     5.5     03hr01Jan2018   3.21
4.2     4.2     6.2     01hr01Jan2018   1.3
4.2     4.2     6.2     02hr01Jan2018   3.4
4.2     4.2     6.2     03hr01Jan2018   1

Explanation: Each coordinate (x,y) has a z-value and three time values. The spaces are not tabs. They are sequence of spaces.

I would like to format the t-column as row and then convert to a csv file. My expected output is as:

ofile.txt
x,y,z,01hr01Jan2018,02hr01Jan2018,03hr01Jan2018
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1

I am trying it in following way, but still not getting the desire output. My script prints some extra commas (,) at the end.

My algorithm and script is:

    #Step1:- Split into two files: one with x,y,z (0001.txt) and
    #        another with t,value (0002.txt).

    awk '{n=3; for (i=1;i<=n;i++) printf "%s ", $i; print "";}' ifile.txt > 0001.txt
    awk '{n=5; for (i=4;i<=n;i++) printf "%s ", $i; print "";}' ifile.txt > 0002.txt

    #Setp2:- In 0001.txt: Delete the repetition rows. 

    awk '!seen[$1,$2,$3]++' 0001.txt > 00011.txt

    #Step3:- In 0002.txt: Delete the first row. For each 3 rows in t-column,
    #        write the value-column as rows. Add the t-row at top
    #        this is very manual. I am wondering for some command

    grep -E "^[0-9].*" 0002.txt > 0003.txt
   awk -v n=3 '{ row = row $2 " "; if (NR % n == 0) { print row; row = "" } }' 0003.txt > 0004.txt
    (echo "01hr01Jan2018,02hr01Jan2018,03hr01Jan2018";cat 0004.txt) > 00022.txt  

    #Step4:- Paste output of two and convert to csv.
    paste 00011.txt 00022.txt > 0005.txt
    cat 0005.txt | tr -s '[:blank:]' ',' > ofile.txt

3 Answers 3

1

You may use this awk:

awk -v OFS=, '{k=$1 OFS $2 OFS $3}
!($4 in hdr){hn[++h]=$4; hdr[$4]}
k in row{row[k]=row[k] OFS $5; next}
{rn[++n]=k; row[k]=$5}
END {
   printf "%s", rn[1]
   for(i=1; i<=h; i++)
      printf "%s", OFS hn[i]
   print ""
   for (i=2; i<=n; i++)
      print rn[i], row[rn[i]]
}' file

x,y,z,t,01hr01Jan2018,02hr01Jan2018,03hr01Jan2018
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. I would like change the format of the t. From alphanumeric to numeric only. The desire format is YYYYMMDDHHMin. That is 01hr01Jan2018,02hr01Jan2018,03hr01Jan2018 with 201801010100,201801010200,201801010300. Would it be possible to kindly suggest how to do it?
Changing date format won't be straight forward as 01hr01Jan2018 is not a standard format. If you cannot stay with present format then let me know, I will have to write few more lines of code to convert.
Thank you very much. Essentially, I need it. I have asked it in a separate question here stackoverflow.com/questions/52504138/…
1

A single awk program can generate your desired output: using GNU awk

gawk '
    BEGIN {SUBSEP = OFS = ","}
    NR==1 {next}
    { groups[$4]; value[$1,$2,$3][$4] = $5 }
    END {
        PROCINFO["sorted_in"] = "@ind_str_asc"
        printf "x,y,z"
        for (g in groups) printf ",%s", g
        printf "\n"
        for (a in value) {
            printf "%s", a
            for (g in groups) printf "%s%s", OFS, 0+value[a][g]
            printf "\n"
        }
    }
' ifile.txt

Comments

1

another similar awk, without the right header

$ awk -v OFS=, '{k=$1 OFS $2 OFS $3} 
           p!=k {if(p) print line; p=k; line=k} 
                {line=line OFS $NF} 
           END  {print line}' file 

x,y,z,value
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.