1

I have a json object that consists of one object with key 'data', that has values listed in a set of arrays. I need to return all arrays that contain the value x, but the arrays themselves do not have keys. I'm trying to write a script to enter a source file (inFile) an define an export file (outFile). Here is my data structure:

{ "data": [
       ["x", 1, 4, 6, 2, 7],
       ["y", 3, 2, 5, 8, 4],
       ["z", 5, 2, 5, 9, 9],
       ["x", 3, 7, 2, 6, 8]
     ]
}

And here is my current script:

import json

def jsonFilter( inFile, outFile ):
    out = None;

    with open( inFile, 'r') as jsonFile:
       d = json.loads(json_data)
       a = d['data']
       b = [b for b in a if b != 'x' ]
       del b
       out = a


    if out:
        with open( outFile, 'w' ) as jsonFile:
            jsonFile.write( json.dumps( out ) );

    else:
       print "Error creating new jsonFile!"

SOLUTION

Thanks to Rob and everyone for your help! Here's the final working command-line tool. This takes two arguments: inFile and Outfile. ~$ python jsonFilter.py inFile.json outFile.json

import json

def jsonFilter( inFile, outFile ):
    # make a dictionary.
    out = {};

    with open( inFile, 'r') as jsonFile:
       json_data = jsonFile.read()
       d = json.loads(json_data)
       # build the data you want to save to look like the original
       # by taking the data in the d['data'] element filtering what you want
       # elements where b[0] is 'x'
       out['data'] = [b for b in d['data'] if b[0] == 'x' ]


    if out:
        with open( outFile, 'w' ) as jsonFile:
            jsonFile.write( json.dumps( out ) );

    else:
       print "Error creating new JSON file!"

if __name__ == "__main__":
     import argparse

     parser = argparse.ArgumentParser()
     parser.add_argument('inFile', nargs=1, help="Choose the in file to use")
     parser.add_argument('outFile', nargs=1, help="Choose the out file to use")
     args = parser.parse_args()
     jsonFilter( args.inFile[0] , args.outFile[0] );
6
  • That is not a valid JSON object; JSON doesn't have sets, and sets can't contain lists anyway. Please show the actual contents of the JSON data. Commented Aug 26, 2016 at 17:41
  • Isn't out always None? Also, are you having any errors? what do you expect to happen? Commented Aug 26, 2016 at 17:45
  • My mistake, while simplifying code I supplied the wrong variable to jsonFile.write() and structured the json incorrectly. Sorry, the JSON file is 400MB (with each array containing 40+ values), so I had to slim it down a bit. 'x' is always in index(0) in each array if that helps. Commented Aug 26, 2016 at 17:48
  • @DGaffneyDC ok, but what's the output you're getting and how it differs from the expected one? Commented Aug 26, 2016 at 17:57
  • I'm looking to return a JSON file that contains all arrays with value "x" in the index(0) position. So essentially filtering the first object to contain only {data: [["x", , 1, 4, 6, 2, 7], ["x", 3, 7, 2, 6, 8]]} Commented Aug 26, 2016 at 18:03

2 Answers 2

2

First problem the query string will be true for everything (aka return the whole data set back since you are comparing b (a list) to 'x' a string

  b = [b for b in a if b != 'x' ]

What you wanted to do was:

  b = [b for b in a if b[0] != 'x' ]

The second problem is you are trying to delete the data by querying and deleting the results. Since the results contain a copy that will not delete anything from the original container.
Instead build the new data with only the elements you want, and save those. Also you were not recreating the 'data' element in your out data, so the json so the output have the same structure as the input data.

import json

def jsonFilter( inFile, outFile ):
    # make a dictionary instead.
    out = {};

    with open( inFile, 'r') as jsonFile:
       json_data = jsonFile.read()
       d = json.loads(json_data)
       # build the data you want to save to look like the original
       # by taking the data in the d['data'] element filtering what you want
       # elements where b[0] is 'x'
       out['data'] = [b for b in d['data'] if b[0] == 'x' ]


    if out:
        with open( outFile, 'w' ) as jsonFile:
            jsonFile.write( json.dumps( out ) );

    else:
       print "Error creating new jsonFile!"

output json data looks like:

 '{"data": [["x", 1, 4, 6, 2, 7], ["x", 3, 7, 2, 6, 8]]}'

If you did not want the output to have the 'data' root element but just the array of data that matched your filter then change the line:

 out['data'] = [b for b in d['data'] if b[0] == 'x' ]

to

 out = [b for b in d['data'] if b[0] == 'x' ]

with this change the output json data looks like:

 '[["x", 1, 4, 6, 2, 7], ["x", 3, 7, 2, 6, 8]]'
Sign up to request clarification or add additional context in comments.

2 Comments

This worked! One minor tweak - adding json_data = jsonFile.read() before d=json.loads(json_data). Other than that, right on the money for what I was looking for. Thanks @Rob!
@DGaffneyDC Gald I could help.. oopsy on the read() line.. must have accidently deleted the line, its fixed. :)
1

So, basically you want to filter out your input data containing arrays whose first element is 'x', maybe something like this will do:

import json


def jsonFilter(inFile, outFile):
    with open(inFile, 'r') as jsonFile:
        d = json.loads(json_data)

        out = {
            'data': filter(lambda x: x[0] == 'x', d['data'])
        }

        if out['data']:
            with open(outFile, 'w') as jsonFile:
                jsonFile.write(json.dumps(out))
        else:
            print "Error creating new jsonFile!"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.