using awk to modify multiple columns

Question

I have a csv file with two columns: date string in ISO8601 and a linux timestamp. How do I use awk to get the output in the following format: col-1: original ISO; col-2: convert timestamp (2) to ISO8601; col-3: diff between the two times (say, in ms)

Example:

Input:

  2018-01-09T16:55:22.545+0000,1515508979185

Output:

  2018-01-09T16:55:22.545+0000,2018-01-09T14:42:59.185+0000,36743360

Not clear, please post more clear requirements of your question with more suitable examples in your post. — RavinderSingh13
– RavinderSingh13, Commented Mar 16, 2018 at 17:57
I'm not sure what is not clear about calculating a difference between two dates and normalizing them to the same ISO 8601 format. Could you be more specific about what is not clear? — dMb
– dMb, Commented Mar 16, 2018 at 18:12

Dima Chubarov · Accepted Answer · 2018-03-17 15:34:22Z

1

Gawk has all the necessary functions to convert date and time between different formats. This is a Gawk extension.

Consider the following command

awk -F, '{ patsplit($1,a,"[0-9]*");
      time1 = mktime(sprintf("%d %d %d %d %d %d",
                   a[1], a[2] ,a[3], a[4], a[5], a[6]))*1000 + a[7];
      time2 = mktime(strftime("%Y %m %d %H %M %S",$2/1000,a[8]))*1000 +$2 %1000;  
      isodate2 = strftime("%Y-%m-%dT%H:%M:%S",$2/1000,a[8]);
      printf "%s;%s.%03d;%s\n",
             $1,
             isodate2,$2 % 1000,
             time1 - time2}' csvfile

It would produce

2018-01-09T16:55:22.545+0000;2018-01-09T14:42:59.185;7943360

Explanation

We use , as a field separator as the input is a CSV file. First we parse the 1st column argument which is an ISO 8601 date. We use patsplit() to extract all numbers out of an ISO 8601 string into an array a so that

  a[1] = YYYY, a[2] = mm, a[3] = dd, 
  a[4] = HH, a[5] = MM, a[6] = SS, a[7] = uuu

We use the array a to convert the 1st column date into a timestamp and compute the difference in microseconds and store the result in the time1 variable.

Handling timezones here requires to compute the equivalent of the 2nd time in the timezone of the 1st timestamp.

Then we print the output line starting with the 1st column; using strftime to convert the timestamp from the 2nd column into ISO8601 date and printing the microseconds separately.

The difference between time1 and time2 is not the same as in the original post.

edited Mar 17, 2018 at 15:34

answered Mar 16, 2018 at 18:14

Dima Chubarov

17.3k7 gold badges45 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ed Morton Over a year ago

You should mention that's gawk-only for time functions.

RomanPerekhrest · Accepted Answer · 2018-03-16 18:37:29Z

1

awk solution:

awk 'BEGIN{ FS=OFS="," }
     { 
         cmd1 = "date -d"$1" +%s"; 
         cmd2 = "date -d@"int($2/1000)" +%FT%T.%3N%z";
         cmd1 | getline d1; close(cmd1);
         cmd2 | getline d2; close(cmd2);
         print $1, d2, d1*1000 - $2 
     }' file

answered Mar 16, 2018 at 18:37

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

Collectives™ on Stack Overflow

using awk to modify multiple columns

2 Answers 2

Explanation

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Explanation

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related