0

Here is some data that I have:

animal { 
    dog {
        body {
            parts {
                legs = old
                brain = average
                tail= curly
                }
   
            }
        }
    cat {
        body {
            parts {
                legs = new
                brain = average
                tail {
                    base=hairy
                    tip=nothairy
                }
   
            }
        }
    }
}

Notice the data is not really json as it has the following rules:

  • supports = or = between key and value pairs.
  • No " or , throughout the data. separation of data is based on new line.

Is it even possible to parse this with awk or sed? I tried jq but it does not work as this isn't really true json data.

My goal is to display only "dog" and "cat". Based on them being the top values under "animal".

$ some-magical-command
dog
cat
1
  • Just displaying dog and cat is quite simple; you could even do it in bash. But is that really the limit of your requirements? In case it is, I'll add an answer. Commented Apr 9, 2022 at 6:14

3 Answers 3

1

To do what you currently want and for ease of any future manipulation of your data, you could use any POSIX awk (for character classes) to convert your structure to JSON and then use jq on it:

$ cat tst.awk
BEGIN { print "{" }
!NF { next }
{
    sub(/[[:space:]]+$/,"")
    gsub(/[[:alnum:]_]+/,"\"&\"")
    gsub(/ *= */,": ")
    sub(/" *{/,"\": {")
}
(++nr) > 1 {
    sep = ( /"/ && (prev ~ /["}]$/) ? "," : "" )
    printf "%s%s%s", prev, sep, ORS
}
{ prev = $0 }
END { print prev ORS "}" }

$ awk -f tst.awk file
{
"animal": {
    "dog": {
        "body": {
            "parts": {
                "legs": "old",
                "brain": "average",
                "tail": "curly"
                }
            }
        },
    "cat": {
        "body": {
            "parts": {
                "legs": "new",
                "brain": "average",
                "tail": {
                    "base": "hairy",
                    "tip": "nothairy"
                }
            }
        }
    }
}
}

Current and some possible future uses:

$ awk -f tst.awk file | jq -r '.animal | keys[]'
cat
dog

$ awk -f tst.awk file | jq -r '.animal.dog.body.parts | keys[]'
brain
legs
tail

$ awk -f tst.awk file | jq -r '.animal.dog.body.parts'
{
  "legs": "old",
  "brain": "average",
  "tail": "curly"
}

$ awk -f tst.awk file | jq -r '.animal.cat.body.parts'
{
  "legs": "new",
  "brain": "average",
  "tail": {
    "base": "hairy",
    "tip": "nothairy"
  }
}

The above assumes your input always looks as shown in your question.

Sign up to request clarification or add additional context in comments.

2 Comments

This is really awesome. However be aware, if there are ip addresses "10.10.10.10/24", every integer between octets gets quotes, how can I avoid that?
Change [[:alnum:]_] in gsub(/[[:alnum:]_]+/,"\"&\"") to be whatever regexp you want to match the strings to be quoted, e.g. gsub(/[[:alnum:]_.\/]+/,"\"&\"") would accommodate the .s and / in your IP address and if there's other characters just include them too. That's what I mean at the bottom by "The above assumes your input always looks as shown in your question." - the script isn't parsing your input as tokens of a language, it's just matching the kinds of text that look like the text you provided in your input.
1

It's fairly close to syntax, if you feel like learning a new language.

set data {
    animal { 
        dog {
            body {
                parts {
                    legs = old
                    brain = large
                    tail= curly
                    }
       
                }
            }
        cat {
            body {
                parts {
                    legs = new
                    brain = tiny
                    tail {
                        base=hairy
                        tip=nothairy
                    }
       
                }
            }
        }
    }
}

set data [regsub -line -all {\s*=\s*(.+)} $data { "\1"}]

dict get $data animal dog body parts brain    ;# => large

I know some people who would argue about your classification of dog brains vs cat brains...

7 Comments

hmm, I get "Command 'dict' not found, but can be installed" do I require another application to run these commands? Again trying to do everything via bash. Re: brain size, I agree, I updated my post.
This is Tcl code so needs to be run in a Tcl interpreter: tclsh
bash does not have arbitrarily deeply nested data structures. Your data indicates bash is an insufficient tool to meet your needs.
That is what I thought, just wanted to make sure. Thanks for the recommendations. I wont be able to use anything outside of bash though unfortunately.
@dave: your question seems to imply that awk is also a possibility. Is that true? And if so, is it OK to rely on Gnu awk?
|
1

If you only need the second-level keys, and you're not too concerned about producing good error messages for erroneous inputs, then it's pretty straight-forward. The basic idea is this:

  1. There are three formats for an input line:

    • ID {
    • ID = value # where the = might not be space-separated
    • }
  2. As the lines are read, we keep track of nesting depth by incrementing a counter with the first line type and decrementing it with the third line type.

  3. When the nesting counter is 1, if the line has an ID field, we print it.

That can be done quite simply with an awk script. This script should be saved in a file with a name like level2_keys.awk; you can then execute the command awk -f level2_keys.awk /path/to/input/file. Note that all the rules end with next; to avoid rules following a match being evaluated.

$1 == "}"    { # Decrement nesting on close
               --nesting;
               next;
             }
/=/          { # Remove the if block if you don't want to print these keys.
               if (nesting == 1) {
                 gsub("=", " = ");    # Force = to be a field
                 print($1);
               }
               next;
             }
$2 == "{"    { # Increment nesting (and maybe print) on open
               if (nesting == 1) print($1);
               ++nesting;
               next;
             }
# NF is non-zero if the line is not blank.
NF           { print "Bad input at " NR ": '"$0"'" > "/dev/stderr"; }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.