Format and then convert txt to csv using shell script and awk

Question

I have a text file:

ifile.txt
x       y       z       t              value
1       1       5       01hr01Jan2018   3
1       1       5       02hr01Jan2018   3.1
1       1       5       03hr01Jan2018   3.2
1       3.4     3       01hr01Jan2018   4.1
1       3.4     3       02hr01Jan2018   6.1
1       3.4     3       03hr01Jan2018   1.1
1       4.2     6       01hr01Jan2018   6.33
1       4.2     6       02hr01Jan2018   8.33
1       4.2     6       03hr01Jan2018   5.33
3.4     1       2       01hr01Jan2018   3.5
3.4     1       2       02hr01Jan2018   5.65
3.4     1       2       03hr01Jan2018   3.66
3.4     3.4     4       01hr01Jan2018   6.32
3.4     3.4     4       02hr01Jan2018   9.32
3.4     3.4     4       03hr01Jan2018   12.32
3.4     4.2     8.1     01hr01Jan2018   7.43
3.4     4.2     8.1     02hr01Jan2018   7.93
3.4     4.2     8.1     03hr01Jan2018   5.43
4.2     1       3.4     01hr01Jan2018   6.12
4.2     1       3.4     02hr01Jan2018   7.15
4.2     1       3.4     03hr01Jan2018   9.12
4.2     3.4     5.5     01hr01Jan2018   2.2
4.2     3.4     5.5     02hr01Jan2018   3.42
4.2     3.4     5.5     03hr01Jan2018   3.21
4.2     4.2     6.2     01hr01Jan2018   1.3
4.2     4.2     6.2     02hr01Jan2018   3.4
4.2     4.2     6.2     03hr01Jan2018   1

Explanation: Each coordinate (x,y) has a z-value and three time values. The spaces are not tabs. They are sequence of spaces.

I would like to format the t-column as row and then convert to a csv file. My expected output is as:

ofile.txt
x,y,z,01hr01Jan2018,02hr01Jan2018,03hr01Jan2018
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1

I am trying it in following way, but still not getting the desire output. My script prints some extra commas (,) at the end.

My algorithm and script is:

    #Step1:- Split into two files: one with x,y,z (0001.txt) and
    #        another with t,value (0002.txt).

    awk '{n=3; for (i=1;i<=n;i++) printf "%s ", $i; print "";}' ifile.txt > 0001.txt
    awk '{n=5; for (i=4;i<=n;i++) printf "%s ", $i; print "";}' ifile.txt > 0002.txt

    #Setp2:- In 0001.txt: Delete the repetition rows. 

    awk '!seen[$1,$2,$3]++' 0001.txt > 00011.txt

    #Step3:- In 0002.txt: Delete the first row. For each 3 rows in t-column,
    #        write the value-column as rows. Add the t-row at top
    #        this is very manual. I am wondering for some command

    grep -E "^[0-9].*" 0002.txt > 0003.txt
   awk -v n=3 '{ row = row $2 " "; if (NR % n == 0) { print row; row = "" } }' 0003.txt > 0004.txt
    (echo "01hr01Jan2018,02hr01Jan2018,03hr01Jan2018";cat 0004.txt) > 00022.txt  

    #Step4:- Paste output of two and convert to csv.
    paste 00011.txt 00022.txt > 0005.txt
    cat 0005.txt | tr -s '[:blank:]' ',' > ofile.txt

anubhava · Accepted Answer · 2018-09-25 17:12:42Z

1

You may use this awk:

awk -v OFS=, '{k=$1 OFS $2 OFS $3}
!($4 in hdr){hn[++h]=$4; hdr[$4]}
k in row{row[k]=row[k] OFS $5; next}
{rn[++n]=k; row[k]=$5}
END {
   printf "%s", rn[1]
   for(i=1; i<=h; i++)
      printf "%s", OFS hn[i]
   print ""
   for (i=2; i<=n; i++)
      print rn[i], row[rn[i]]
}' file

x,y,z,t,01hr01Jan2018,02hr01Jan2018,03hr01Jan2018
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1

edited Sep 25, 2018 at 17:12

answered Sep 25, 2018 at 16:57

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kay Over a year ago

Thank you. I would like change the format of the t. From alphanumeric to numeric only. The desire format is YYYYMMDDHHMin. That is 01hr01Jan2018,02hr01Jan2018,03hr01Jan2018 with 201801010100,201801010200,201801010300. Would it be possible to kindly suggest how to do it?

anubhava Over a year ago

Changing date format won't be straight forward as 01hr01Jan2018 is not a standard format. If you cannot stay with present format then let me know, I will have to write few more lines of code to convert.

Kay Over a year ago

Thank you very much. Essentially, I need it. I have asked it in a separate question here stackoverflow.com/questions/52504138/…

glenn jackman · Accepted Answer · 2018-09-25 16:57:44Z

1

A single awk program can generate your desired output: using GNU awk

gawk '
    BEGIN {SUBSEP = OFS = ","}
    NR==1 {next}
    { groups[$4]; value[$1,$2,$3][$4] = $5 }
    END {
        PROCINFO["sorted_in"] = "@ind_str_asc"
        printf "x,y,z"
        for (g in groups) printf ",%s", g
        printf "\n"
        for (a in value) {
            printf "%s", a
            for (g in groups) printf "%s%s", OFS, 0+value[a][g]
            printf "\n"
        }
    }
' ifile.txt

answered Sep 25, 2018 at 16:57

glenn jackman

249k42 gold badges233 silver badges362 bronze badges

Comments

karakfa · Accepted Answer · 2018-09-25 17:05:33Z

1

another similar awk, without the right header

$ awk -v OFS=, '{k=$1 OFS $2 OFS $3} 
           p!=k {if(p) print line; p=k; line=k} 
                {line=line OFS $NF} 
           END  {print line}' file 

x,y,z,value
1,1,5,3,3.1,3.2
1,3.4,3,4.1,6.1,1.1
1,4.2,6,6.33,8.33,5.33
3.4,1,2,3.5,5.65,3.66
3.4,3.4,4,6.32,9.32,12.32
3.4,4.2,8.1,7.43,7.93,5.43
4.2,1,3.4,6.12,7.15,9.12
4.2,3.4,5.5,2.2,3.42,3.21
4.2,4.2,6.2,1.3,3.4,1

answered Sep 25, 2018 at 17:05

karakfa

67.8k8 gold badges45 silver badges59 bronze badges

Collectives™ on Stack Overflow

Format and then convert txt to csv using shell script and awk

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related