linux / bash parse through json-like data

Question

Here is some data that I have:

animal { 
    dog {
        body {
            parts {
                legs = old
                brain = average
                tail= curly
                }
   
            }
        }
    cat {
        body {
            parts {
                legs = new
                brain = average
                tail {
                    base=hairy
                    tip=nothairy
                }
   
            }
        }
    }
}

Notice the data is not really json as it has the following rules:

supports = or = between key and value pairs.
No " or , throughout the data. separation of data is based on new line.

Is it even possible to parse this with awk or sed? I tried jq but it does not work as this isn't really true json data.

My goal is to display only "dog" and "cat". Based on them being the top values under "animal".

$ some-magical-command
dog
cat

Just displaying dog and cat is quite simple; you could even do it in bash. But is that really the limit of your requirements? In case it is, I'll add an answer. — rici
– rici, Commented Apr 9, 2022 at 6:14

Ed Morton · Accepted Answer · 2022-04-09 18:42:55Z

1

To do what you currently want and for ease of any future manipulation of your data, you could use any POSIX awk (for character classes) to convert your structure to JSON and then use jq on it:

$ cat tst.awk
BEGIN { print "{" }
!NF { next }
{
    sub(/[[:space:]]+$/,"")
    gsub(/[[:alnum:]_]+/,"\"&\"")
    gsub(/ *= */,": ")
    sub(/" *{/,"\": {")
}
(++nr) > 1 {
    sep = ( /"/ && (prev ~ /["}]$/) ? "," : "" )
    printf "%s%s%s", prev, sep, ORS
}
{ prev = $0 }
END { print prev ORS "}" }

$ awk -f tst.awk file
{
"animal": {
    "dog": {
        "body": {
            "parts": {
                "legs": "old",
                "brain": "average",
                "tail": "curly"
                }
            }
        },
    "cat": {
        "body": {
            "parts": {
                "legs": "new",
                "brain": "average",
                "tail": {
                    "base": "hairy",
                    "tip": "nothairy"
                }
            }
        }
    }
}
}

Current and some possible future uses:

$ awk -f tst.awk file | jq -r '.animal | keys[]'
cat
dog

$ awk -f tst.awk file | jq -r '.animal.dog.body.parts | keys[]'
brain
legs
tail

$ awk -f tst.awk file | jq -r '.animal.dog.body.parts'
{
  "legs": "old",
  "brain": "average",
  "tail": "curly"
}

$ awk -f tst.awk file | jq -r '.animal.cat.body.parts'
{
  "legs": "new",
  "brain": "average",
  "tail": {
    "base": "hairy",
    "tip": "nothairy"
  }
}

The above assumes your input always looks as shown in your question.

edited Apr 9, 2022 at 18:42

answered Apr 9, 2022 at 12:59

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dave Over a year ago

This is really awesome. However be aware, if there are ip addresses "10.10.10.10/24", every integer between octets gets quotes, how can I avoid that?

Ed Morton Over a year ago

Change [[:alnum:]_] in gsub(/[[:alnum:]_]+/,"\"&\"") to be whatever regexp you want to match the strings to be quoted, e.g. gsub(/[[:alnum:]_.\/]+/,"\"&\"") would accommodate the .s and / in your IP address and if there's other characters just include them too. That's what I mean at the bottom by "The above assumes your input always looks as shown in your question." - the script isn't parsing your input as tokens of a language, it's just matching the kinds of text that look like the text you provided in your input.

glenn jackman · Accepted Answer · 2022-04-08 18:55:19Z

1

It's fairly close to tcl syntax, if you feel like learning a new language.

set data {
    animal { 
        dog {
            body {
                parts {
                    legs = old
                    brain = large
                    tail= curly
                    }
       
                }
            }
        cat {
            body {
                parts {
                    legs = new
                    brain = tiny
                    tail {
                        base=hairy
                        tip=nothairy
                    }
       
                }
            }
        }
    }
}

set data [regsub -line -all {\s*=\s*(.+)} $data { "\1"}]

dict get $data animal dog body parts brain    ;# => large

I know some people who would argue about your classification of dog brains vs cat brains...

answered Apr 8, 2022 at 18:55

glenn jackman

249k42 gold badges233 silver badges362 bronze badges

7 Comments

Dave Over a year ago

hmm, I get "Command 'dict' not found, but can be installed" do I require another application to run these commands? Again trying to do everything via bash. Re: brain size, I agree, I updated my post.

glenn jackman Over a year ago

This is Tcl code so needs to be run in a Tcl interpreter: tclsh

glenn jackman Over a year ago

bash does not have arbitrarily deeply nested data structures. Your data indicates bash is an insufficient tool to meet your needs.

Dave Over a year ago

That is what I thought, just wanted to make sure. Thanks for the recommendations. I wont be able to use anything outside of bash though unfortunately.

rici Over a year ago

@dave: your question seems to imply that awk is also a possibility. Is that true? And if so, is it OK to rely on Gnu awk?

|

rici · Accepted Answer · 2022-04-09 06:38:31Z

If you only need the second-level keys, and you're not too concerned about producing good error messages for erroneous inputs, then it's pretty straight-forward. The basic idea is this:

There are three formats for an input line:
- ID {
- ID = value # where the = might not be space-separated
- }
As the lines are read, we keep track of nesting depth by incrementing a counter with the first line type and decrementing it with the third line type.
When the nesting counter is 1, if the line has an ID field, we print it.

That can be done quite simply with an awk script. This script should be saved in a file with a name like level2_keys.awk; you can then execute the command awk -f level2_keys.awk /path/to/input/file. Note that all the rules end with next; to avoid rules following a match being evaluated.

$1 == "}"    { # Decrement nesting on close
               --nesting;
               next;
             }
/=/          { # Remove the if block if you don't want to print these keys.
               if (nesting == 1) {
                 gsub("=", " = ");    # Force = to be a field
                 print($1);
               }
               next;
             }
$2 == "{"    { # Increment nesting (and maybe print) on open
               if (nesting == 1) print($1);
               ++nesting;
               next;
             }
# NF is non-zero if the line is not blank.
NF           { print "Bad input at " NR ": '"$0"'" > "/dev/stderr"; }

Collectives™ on Stack Overflow

linux / bash parse through json-like data

3 Answers 3

2 Comments

7 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related