2

In Bash shell script, I want to extract an object. For example, with following json file, I would like to extract dependencies object and it should return me: "dmg": ">= 0.0.0", "build-essential": ">= 0.0.0", "windows": ">= 0.0.0" in whatever format and how do you do that?

// My data 1.json:

{
    "platforms": {
        "amazon": ">= 0.0.0",
        "arch": ">= 0.0.0",
        "centos": ">= 0.0.0",
        "debian": ">= 0.0.0"
    },
    "dependencies": {
        "dmg": ">= 0.0.0",
        "build-essential": ">= 0.0.0",
        "windows": ">= 0.0.0"
    },
    "recommendations": {}
}

// My data 2.json:

{
    "platforms": {
        "amazon": ">= 0.0.0",
        "arch": ">= 0.0.0",
        "centos": ">= 0.0.0",
        "debian": ">= 0.0.0"
    },
    "recommendations": {},
    "dependencies": {
        "dmg": ">= 0.0.0",
        "build-essential": ">= 0.0.0",
        "windows": ">= 0.0.0"
    }
}

// My data 3.json:

{
    "dependencies": {
        "dmg": ">= 0.0.0",
        "build-essential": ">= 0.0.0",
        "windows": ">= 0.0.0"
    },
    "platforms": {
        "amazon": ">= 0.0.0",
        "arch": ">= 0.0.0",
        "centos": ">= 0.0.0",
        "debian": ">= 0.0.0"
    },
    "recommendations": {}
}

// My data 4.json:

{
    "dependencies": {
        "dmg": ">= 0.0.0",
        "build-essential": ">= 0.0.0",
        "windows": ">= 0.0.0"
    }
}

// My data 5.json (compress):

{"dependencies":{"dmg":">= 0.0.0","build-essential":">= 0.0.0","windows":">= 0.0.0"},"platforms":{"amazon":">= 0.0.0","arch":">= 0.0.0","centos":">= 0.0.0","debian":">= 0.0.0"},"recommendations":{}}

4 Answers 4

1

Have you looked at jsawk? I would generally use python for parsing JSON data on UNIX systems, since it usually comes bundled with the OS.

Anyways, you can try this:

awk "/dependencies/,/}/ { print }" test.json | grep ":" | grep -v dependencies

in general, to get text between two patterns/strings:

awk "/Pattern1/,/Pattern2/ { print }" inputFile

and then use grep ":" to get all the lines containing the ':' in the object, and then filter out the object name itself by getting all the subsequent lines not containing the object name

UPDATE: for json not in pretty format

sed "s/[,{}]/&\n/g" prettified.json | awk "/dependencies/,/}/ { print }" | grep ":" | grep -v dependencies | awk '{$1=$1}1'
Sign up to request clarification or add additional context in comments.

3 Comments

this might be a problem if json file is not pretty format, for example json file is compressed into one single line. Any thought?
Updated the regex, now should work for both pretty and ugly formats lol
Doesn't work if its a string coming in from a curl response.
0

Here is one way with awk:

awk -v RS= -F'},|{' '{print $5}' file | awk 'NF'

$ awk -v RS= -F'},|{' '{print $5}' f | awk 'NF'
    "dmg": ">= 0.0.0",
    "build-essential": ">= 0.0.0",
    "windows": ">= 0.0.0"

4 Comments

I'm curious, how does it know the block named "dependencies" to extract? I don't see the keyword "dependencies" in your command.
@NamNguyen I set the input to paragraph mode RS= and the field separator to }, or {. Once the file is split I just pick the field you seek by stating the $5. That is the field holding value you need. The last pipe is to remove blank lines.
it's very sort and I'm having hard time to understand your solution. I'm still reading your commend and learn from pro :) . Kinna like your solution :)
@NamNguyen Thanks :). It's pretty simple. Just look at your input and count the fields by splitting them at every }, or {. You'll see your block is the 5th field.
0
$ $ tr -d '\n' < myjson.json | sed -e's/[}{]//g' | sed -e's/.*dependencies\":\(.*\)\s*,.*/\1/g' | sed -e's/^ *//g' | sed -e's/, */, /g'
"dmg": ">= 0.0.0", "build-essential": ">= 0.0.0", "windows": ">= 0.0.0"

7 Comments

look like this solution does not work well for some of my cases.
Sorry I didn't know about your cases :)
for example, my case is json file is compressed such as all in one line. But I still appreciate for your input.
that's what the tr -d '\n' < myjson.json does.
for example, this would fail for this test case: { "b": { }, "dependencies": { "dmg": ">= 0.0.0", "build-essential": ">= 0.0.0", "windows": ">= 0.0.0" }, "a": { } }
|
-1
sed -n '/dependencies/, /}/ p' t|grep '>='


How this works :

First get the text between dependencies block, and then extract the dependencies.

Note that this method is independent of where in the text the dependency block is located. As long as it is present, you'll get the answer.


aman@apollo:~$ sed -n '/dependencies/, /\}/ p' t|grep '>='
        "dmg": ">= 0.0.0",
        "build-essential": ">= 0.0.0",
        "windows": ">= 0.0.0"

Use sed -n '/dependencies/, /}/ p' t|grep '.*='

If there can be symbols like ~=, = in the dependency block (and not just >=).


Compressed version For the compressed version, you can first "decompress" (insert newlines) the file and then apply the same transformation.

sed -e 's/:{/:{\n/g'  -e  's/},/\n},\n/g' d5|sed -n '/dependencies/, /}/ p'|grep '>='

The original solution will work for all the 4 other files.

4 Comments

would be safer this way? sed -n '/dependencies/, /}/ p' metadata.json | grep -v "[{}]"
do you know the answer for this question? stackoverflow.com/questions/22316714/…
How is sed reading the file here? makes no sense to me.
Bad filenames perhaps (t, d5).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.