0

I have a file in the format below. Can anyone convert it in columns? I have tried the awk command below but it creates more that 4 columns if one customer has multiple hostnames.

awk '/"customer_name":/{if (x)print x;x="";}{x=(!x)?$0:x","$0;}END{print x;}' filename

Input:

customer_name: "abc"
  "HostName": "tm-1"
  "LastDayRxBytes": 0
  "Status": "offline"
  "HostName": "tm-2"
  "LastDayRxBytes": 0
  "Status": "offline"
  "HostName": "tm-3"
  "LastDayRxBytes": 0
  "Status": "offline"
  "HostName": "new-va-threat-01"
  "LastDayRxBytes": 0
  "Status": "offline"
customer_name: "xyz"
  "HostName": "tm-56"
  "LastDayRxBytes": 10708747
  "Status": "ok"
customer_name: "def"
customer_name: "uvw"
  "HostName": "tm-23"
  "LastDayRxBytes": 34921829912
  "Status": "ok"
customer_name: "new cust"
  "HostName": "tm-1-3"
  "LastDayRxBytes": 33993187093
  "Status": "ok"
customer_name: "a12 d32 ffg"
customer_name: "bcd abc"
customer_name: "mno opq"
customer_name: "abc dhg pvt ltd."
  "HostName": "tm-10"
  "LastDayRxBytes": 145774401010
  "Status": "ok"
  "HostName": "tm-ngtm-13"
  "LastDayRxBytes": 150159680874
  "Status": "ok"
  "HostName": "new-ngtm-11"
  "LastDayRxBytes": 207392526747
  "Status": "ok"
  "HostName": "old-ngtm-06"
  "LastDayRxBytes": 17708734533
  "Status": "ok"
  "HostName": "tm-08"
  "LastDayRxBytes": 559289251
  "Status": "ok"
  "HostName": "tm-12"
  "LastDayRxBytes": 534145552271
  "Status": "ok"

I want it to be printed in column and rows as:

Column 1               Column 2             Column 3             Column 4
CustName               Host                 Last RX              Status
abc                    tm-1                 0                    offline
abc                    tm-2                 0                    offline
abc                    tm-3                 0                    offline
abc                    new-va-threat-01     0                    offline
xyz                    tm-56                10708747             ok
def                    
uvw                    tm-23                34921829912          ok
new_cust               tm-1-3               33993187093          ok
a12 d32 ffg
acd abc
mno opq
abc dhg pvt ltd.       tm-10                145774401010         ok
abc dhg pvt ltd.       tm-ngtm-13           150159680874         ok
abc dhg pvt ltd.       new-ngtm-11          207392526747         ok
abc dhg pvt ltd.       old-ngtm-06          17708734533          ok
abc dhg pvt ltd.       tm-08                559289251            ok
abc dhg pvt ltd.       tm-12                534145552271         ok
3
  • Column4 Column3 Column3 Column4 Customer Name Host Name Received Status abc tm-1 0 offline abc tm-2 0 offline abc tm-3 0 offline abc new-va-threat-01 0 offline xyz tm-56 10708747 ok def uvw tm-23 34921829912 ok new cust tm-1-3 33993187093 ok a12 d32 ffg bcd abc mno opq abc dhg pvt ltd. tm-10 1.45774E+11 ok abc dhg pvt ltd. tm-ngtm-13 1.5016E+11 ok abc dhg pvt ltd. new-ngtm-11 2.07393E+11 ok abc dhg pvt ltd. old-ngtm-06 17708734533 ok abc dhg pvt ltd. tm-08 559289251 ok abc dhg pvt ltd. tm-12 5.34146E+11 ok Commented Sep 26, 2017 at 12:39
  • 1
    Is there any non-obvious meaning in that comment? If yes, please edit your question to convey it. Commented Sep 26, 2017 at 12:40
  • Can any of your strings contain : or :<blank>? How about an escaped " (e.g. \" or "")? Commented Sep 26, 2017 at 14:25

3 Answers 3

1

I'd write this

awk -F": " -v OFS="\t" '
    BEGIN {print "CustName", "Host", "Last RX", "Status"}
    {
        gsub(/"/,"")
        sub(/^[[:blank:]]+/,"")
    }
    $1 == "customer_name" {
        if ("customer_name" in data && !have_data)
            print data["customer_name"]
        have_data = 0
    }
    {
        data[$1] = $2
    }
    ("HostName" in data) && ("LastDayRxBytes" in data) && ("Status" in data) {
        print data["customer_name"], data["HostName"], data["LastDayRxBytes"], data["Status"]
        delete data["HostName"]
        delete data["LastDayRxBytes"]
        delete data["Status"]
        have_data = 1
    }
' file | column -s $'\t' -t
CustName          Host              Last RX       Status
abc               tm-1              0             offline
abc               tm-2              0             offline
abc               tm-3              0             offline
abc               new-va-threat-01  0             offline
xyz               tm-56             10708747      ok
def
uvw               tm-23             34921829912   ok
new cust          tm-1-3            33993187093   ok
a12 d32 ffg
bcd abc
mno opq
abc dhg pvt ltd.  tm-10             145774401010  ok
abc dhg pvt ltd.  tm-ngtm-13        150159680874  ok
abc dhg pvt ltd.  new-ngtm-11       207392526747  ok
abc dhg pvt ltd.  old-ngtm-06       17708734533   ok
abc dhg pvt ltd.  tm-08             559289251     ok
abc dhg pvt ltd.  tm-12             534145552271  ok
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you Glenn it works!. I really appreciate your quick help. I have one query, I take the output in csv file it comes in 1 Column with tab spaces. Is there anyway we can print it custname in column A, Host Column B and so on ?
If you take out the | column -s $'\t' -t part, you'll be left with tab-separated columns.
0

Perl to the rescue:

perl -lne '
    if (/customer_name: "(.*)"/) {
        print $h{name} unless $h{printed} || !%h;
        undef $h{printed} if $1 ne $h{name};
        $h{name} = $1;
    } else {
        /"([^"]+)": "?([^"]+)"?/ and $h{$1} = $2;
        $h{printed} = print join "\t",
            @h{qw{ name HostName LastDayRxBytes Status }}
            if "Status" eq $1;
    }
    END { print $h{name} unless $h{printed} || !%h }
    ' < input_file
  • The %h hash is used to gather information about lines to be printed.
  • When a customer name is read, the previous customer name is printed if it hasn't been printed yet. The same happens at the very end of the input to print a possible last customer with no details.
  • A line is printed when Status is read.

Comments

0

gnu awk solution:

$ cat tst.awk
BEGIN {
   RS="customer_name: "
   pr("Column1", "Column2", "Column3", "Column4")
   pr("Custname", "Host", "Last RX", "Status")
}
match($0, /"([^"]+)"/, cust) {
   printed=0
   str=substr($0, RLENGTH+2)
   while (match( str, /"HostName":\s"([^"]+)"\s+"LastDayRxBytes":\s(\S+)\s+"Status":\s"([^"]+)"\s/, col)){
      str=substr(str, RLENGTH+3)
      pr( cust[1], col[1], col[2], col[3] )
      printed=1
   }
   if (!printed) pr(cust[1])
}
function pr(cust,host,rx,status) {
   printf "%-16s\t%-16s\t%-16s\t%-10s\n", cust, host, rx, status
}

Based on the example input, one can tackle this one using regexes and the match function as well. Testing it:

$ awk -f tst.awk input.txt
Column1             Column2             Column3             Column4
Custname            Host                Last RX             Status
abc                 tm-1                0                   offline
abc                 tm-2                0                   offline
abc                 tm-3                0                   offline
abc                 new-va-threat-01    0                   offline
xyz                 tm-56               10708747            ok
def
uvw                 tm-23               34921829912         ok
new cust            tm-1-3              33993187093         ok
a12 d32 ffg
bcd abc
mno opq
abc dhg pvt ltd.    tm-10               145774401010        ok
abc dhg pvt ltd.    tm-ngtm-13          150159680874        ok
abc dhg pvt ltd.    new-ngtm-11         207392526747        ok
abc dhg pvt ltd.    old-ngtm-06         17708734533         ok
abc dhg pvt ltd.    tm-08               559289251           ok
abc dhg pvt ltd.    tm-12               534145552271        ok

Explanation:

  • record separator RS is set on customer_name:, so $0 contains all host, rx and status information per customer.
  • 1st match with regex "([^"]+)" will capture the customer
  • 2nd match with regex "HostName":\s"([^"]+)"\s+"LastDayRxBytes":\s(\S+)\s+"Status":\s"([^"]+)"\s will capture hostname, rx and status.
  • if the 2nd match succeeds, shorten the string you want to use in your next match.

I know, this is not the awk way of doing things, but then again the regular format of the input allows this - quite concise - regex-based solution.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.