How to convert my file in rows and column using shell scripting

Question

I have a file in the format below. Can anyone convert it in columns? I have tried the awk command below but it creates more that 4 columns if one customer has multiple hostnames.

awk '/"customer_name":/{if (x)print x;x="";}{x=(!x)?$0:x","$0;}END{print x;}' filename

Input:

customer_name: "abc"
  "HostName": "tm-1"
  "LastDayRxBytes": 0
  "Status": "offline"
  "HostName": "tm-2"
  "LastDayRxBytes": 0
  "Status": "offline"
  "HostName": "tm-3"
  "LastDayRxBytes": 0
  "Status": "offline"
  "HostName": "new-va-threat-01"
  "LastDayRxBytes": 0
  "Status": "offline"
customer_name: "xyz"
  "HostName": "tm-56"
  "LastDayRxBytes": 10708747
  "Status": "ok"
customer_name: "def"
customer_name: "uvw"
  "HostName": "tm-23"
  "LastDayRxBytes": 34921829912
  "Status": "ok"
customer_name: "new cust"
  "HostName": "tm-1-3"
  "LastDayRxBytes": 33993187093
  "Status": "ok"
customer_name: "a12 d32 ffg"
customer_name: "bcd abc"
customer_name: "mno opq"
customer_name: "abc dhg pvt ltd."
  "HostName": "tm-10"
  "LastDayRxBytes": 145774401010
  "Status": "ok"
  "HostName": "tm-ngtm-13"
  "LastDayRxBytes": 150159680874
  "Status": "ok"
  "HostName": "new-ngtm-11"
  "LastDayRxBytes": 207392526747
  "Status": "ok"
  "HostName": "old-ngtm-06"
  "LastDayRxBytes": 17708734533
  "Status": "ok"
  "HostName": "tm-08"
  "LastDayRxBytes": 559289251
  "Status": "ok"
  "HostName": "tm-12"
  "LastDayRxBytes": 534145552271
  "Status": "ok"

I want it to be printed in column and rows as:

Column 1               Column 2             Column 3             Column 4
CustName               Host                 Last RX              Status
abc                    tm-1                 0                    offline
abc                    tm-2                 0                    offline
abc                    tm-3                 0                    offline
abc                    new-va-threat-01     0                    offline
xyz                    tm-56                10708747             ok
def                    
uvw                    tm-23                34921829912          ok
new_cust               tm-1-3               33993187093          ok
a12 d32 ffg
acd abc
mno opq
abc dhg pvt ltd.       tm-10                145774401010         ok
abc dhg pvt ltd.       tm-ngtm-13           150159680874         ok
abc dhg pvt ltd.       new-ngtm-11          207392526747         ok
abc dhg pvt ltd.       old-ngtm-06          17708734533          ok
abc dhg pvt ltd.       tm-08                559289251            ok
abc dhg pvt ltd.       tm-12                534145552271         ok

Column4 Column3 Column3 Column4 Customer Name Host Name Received Status abc tm-1 0 offline abc tm-2 0 offline abc tm-3 0 offline abc new-va-threat-01 0 offline xyz tm-56 10708747 ok def uvw tm-23 34921829912 ok new cust tm-1-3 33993187093 ok a12 d32 ffg bcd abc mno opq abc dhg pvt ltd. tm-10 1.45774E+11 ok abc dhg pvt ltd. tm-ngtm-13 1.5016E+11 ok abc dhg pvt ltd. new-ngtm-11 2.07393E+11 ok abc dhg pvt ltd. old-ngtm-06 17708734533 ok abc dhg pvt ltd. tm-08 559289251 ok abc dhg pvt ltd. tm-12 5.34146E+11 ok — Majeed
– Majeed, Commented Sep 26, 2017 at 12:39
Is there any non-obvious meaning in that comment? If yes, please edit your question to convey it. — Yunnosch
– Yunnosch, Commented Sep 26, 2017 at 12:40
Can any of your strings contain : or :<blank>? How about an escaped " (e.g. \" or "")? — Ed Morton
– Ed Morton, Commented Sep 26, 2017 at 14:25

glenn jackman · Accepted Answer · 2017-09-26 14:00:04Z

1

I'd write this

awk -F": " -v OFS="\t" '
    BEGIN {print "CustName", "Host", "Last RX", "Status"}
    {
        gsub(/"/,"")
        sub(/^[[:blank:]]+/,"")
    }
    $1 == "customer_name" {
        if ("customer_name" in data && !have_data)
            print data["customer_name"]
        have_data = 0
    }
    {
        data[$1] = $2
    }
    ("HostName" in data) && ("LastDayRxBytes" in data) && ("Status" in data) {
        print data["customer_name"], data["HostName"], data["LastDayRxBytes"], data["Status"]
        delete data["HostName"]
        delete data["LastDayRxBytes"]
        delete data["Status"]
        have_data = 1
    }
' file | column -s $'\t' -t

CustName          Host              Last RX       Status
abc               tm-1              0             offline
abc               tm-2              0             offline
abc               tm-3              0             offline
abc               new-va-threat-01  0             offline
xyz               tm-56             10708747      ok
def
uvw               tm-23             34921829912   ok
new cust          tm-1-3            33993187093   ok
a12 d32 ffg
bcd abc
mno opq
abc dhg pvt ltd.  tm-10             145774401010  ok
abc dhg pvt ltd.  tm-ngtm-13        150159680874  ok
abc dhg pvt ltd.  new-ngtm-11       207392526747  ok
abc dhg pvt ltd.  old-ngtm-06       17708734533   ok
abc dhg pvt ltd.  tm-08             559289251     ok
abc dhg pvt ltd.  tm-12             534145552271  ok

edited Sep 26, 2017 at 14:00

answered Sep 26, 2017 at 13:11

glenn jackman

249k42 gold badges233 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Majeed Over a year ago

Thank you Glenn it works!. I really appreciate your quick help. I have one query, I take the output in csv file it comes in 1 Column with tab spaces. Is there anyway we can print it custname in column A, Host Column B and so on ?

glenn jackman Over a year ago

If you take out the | column -s $'\t' -t part, you'll be left with tab-separated columns.

choroba · Accepted Answer · 2017-09-26 12:49:23Z

Perl to the rescue:

perl -lne '
    if (/customer_name: "(.*)"/) {
        print $h{name} unless $h{printed} || !%h;
        undef $h{printed} if $1 ne $h{name};
        $h{name} = $1;
    } else {
        /"([^"]+)": "?([^"]+)"?/ and $h{$1} = $2;
        $h{printed} = print join "\t",
            @h{qw{ name HostName LastDayRxBytes Status }}
            if "Status" eq $1;
    }
    END { print $h{name} unless $h{printed} || !%h }
    ' < input_file

The %h hash is used to gather information about lines to be printed.
When a customer name is read, the previous customer name is printed if it hasn't been printed yet. The same happens at the very end of the input to print a possible last customer with no details.
A line is printed when Status is read.

Marc Lambrichs · Accepted Answer · 2017-09-26 18:00:45Z

gnu awk solution:

$ cat tst.awk
BEGIN {
   RS="customer_name: "
   pr("Column1", "Column2", "Column3", "Column4")
   pr("Custname", "Host", "Last RX", "Status")
}
match($0, /"([^"]+)"/, cust) {
   printed=0
   str=substr($0, RLENGTH+2)
   while (match( str, /"HostName":\s"([^"]+)"\s+"LastDayRxBytes":\s(\S+)\s+"Status":\s"([^"]+)"\s/, col)){
      str=substr(str, RLENGTH+3)
      pr( cust[1], col[1], col[2], col[3] )
      printed=1
   }
   if (!printed) pr(cust[1])
}
function pr(cust,host,rx,status) {
   printf "%-16s\t%-16s\t%-16s\t%-10s\n", cust, host, rx, status
}

Based on the example input, one can tackle this one using regexes and the match function as well. Testing it:

$ awk -f tst.awk input.txt
Column1             Column2             Column3             Column4
Custname            Host                Last RX             Status
abc                 tm-1                0                   offline
abc                 tm-2                0                   offline
abc                 tm-3                0                   offline
abc                 new-va-threat-01    0                   offline
xyz                 tm-56               10708747            ok
def
uvw                 tm-23               34921829912         ok
new cust            tm-1-3              33993187093         ok
a12 d32 ffg
bcd abc
mno opq
abc dhg pvt ltd.    tm-10               145774401010        ok
abc dhg pvt ltd.    tm-ngtm-13          150159680874        ok
abc dhg pvt ltd.    new-ngtm-11         207392526747        ok
abc dhg pvt ltd.    old-ngtm-06         17708734533         ok
abc dhg pvt ltd.    tm-08               559289251           ok
abc dhg pvt ltd.    tm-12               534145552271        ok

Explanation:

record separator RS is set on customer_name:, so $0 contains all host, rx and status information per customer.
1st match with regex "([^"]+)" will capture the customer
2nd match with regex "HostName":\s"([^"]+)"\s+"LastDayRxBytes":\s(\S+)\s+"Status":\s"([^"]+)"\s will capture hostname, rx and status.
if the 2nd match succeeds, shorten the string you want to use in your next match.

I know, this is not the awk way of doing things, but then again the regular format of the input allows this - quite concise - regex-based solution.

Collectives™ on Stack Overflow

How to convert my file in rows and column using shell scripting

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related