2

I'm trying to filter a Json array using Python based on multiple conditions. My Json is similar to this (no root name):

     {           
       "id": "123455",           
       "outter": {
          "inner": [
            {
              "nox": "abc:6666",
              "code": "1329",        
            }
           ],    
        },
        "topic": {
         "reference": "excel"
        }, 
        "date1": "1990-07-28T03:52:44-04:00",
        "finalDate": "1990-07-28T03:52:44-04:00"
      }
      {           
       "id": "123435",           
       "outter": {
          "inner": [
            {
              "nox": "abc:6666",
              "code": "9351",        
            }
           ],    
        },
        "topic": {
         "reference": "excel"
        }, 
        "date1": "1990-07-28T03:52:44-04:00",
        "finalDate": "1995-07-28T03:52:44-04:00"
      }

My goal is to filter based on 2 conditions and return all that match them.

1: outter --> inner --> code = 9351 AND

2: finalDate >= 1995

So far I can do this check separate with no problem with the following code:

   data = pd.read_json('myFile.ndjson', lines = True)

   for item in data['outter']:
      for x in item['inner']:
         if(x['code'] == '9351'):
            found....

but not sure how to do both at the same time since I have to start the loop with either data['outter'] or data['finalDate'] and inside the loop I have only visibility to that element of the array, not the complete array.

Any help is appreciated, thanks!

1
  • 1
    you have a minor typo - as it looks like you have a json array you can wrap the input json with braces [] and separate the elements with a comma. Also coincidentally this be used as a Python list object as is. Commented Sep 21, 2021 at 2:53

2 Answers 2

3

Here's one solution that can filter the list as mentioned. I'm using list comprehensions instead of loops and there's probably some stuff in there that could be improved, but the result seems to at least be as expected.

Note: This uses the walrus := operator which is introduced in Python 3.8. If you're running in an earlier Python version, you can probably remove the code that uses it, but I haven't bothered too much in this case.

from pprint import pprint


data_list = [
    {
        "id": "123455",
        "outter": {
            "inner": [
                {
                    "nox": "abc:6666",
                    "code": "1329",
                }
            ],
        },
        "topic": {
            "reference": "excel"
        },
        "date1": "1990-07-28T03:52:44-04:00",
        "finalDate": "1990-07-28T03:52:44-04:00"
    },
    {
        "id": "123435",
        "outter": {
            "inner": [
                {
                    "nox": "abc:6666",
                    "code": "9351",
                }
            ],
        },
        "topic": {
            "reference": "excel"
        },
        "date1": "1990-07-28T03:52:44-04:00",
        "finalDate": "1995-07-28T03:52:44-04:00"
    }
]

result = [d for d in data_list
          if (year := d['finalDate'][:4]).isnumeric() and year >= '1995'
          and any(str(inner['code']) == '9351'
                  for inner in d['outter']['inner'] or [])]

pprint(result)

@ted made a good point that readability counts, so I had some time to go back and write it in a typical loop format (same logic as above essentially). I also a lot of comments to hopefully clarify on what's going on in code, hope you find it to be helpful :-)

from pprint import pprint
from typing import Dict, List

result = []

# Looping over each dictionary in list
for d in data_list:
    # Grab the year part, first four characters of `finalDate`
    year = d['finalDate'][:4]
    # isnumeric() to confirm the year part is an an integer
    # then check if string (now confirmed to be numeric) has a higher
    # ASCII value than 1995.
    valid_year = year.isnumeric() and year >= '1995'
    # Simple, if it doesn't match our desired year then continue
    # with next loop iteration
    if not valid_year:
        continue
    # Get inner list, then loop over it
    inner_list: List[Dict] = d['outter']['inner']
    for inner in inner_list:
        if inner['code'] == '9351':
            # We found an inner with our desired code!
            break
    # The for-else syntax is pretty self-explanatory. This `else` statement is
    # only run when we don't `break` out of the loop above.
    else:
        # No break statement was run, so the desired code did not match
        # any elements. Again, continue with the next iteration.
        continue
    # At this point, we know it's valid since it matched both our
    # conditions, so we add it to the result list.
    result.append(d)

# Print the result, but it should be the same
pprint(result)
Sign up to request clarification or add additional context in comments.

9 Comments

While this may solve the issue, I believe a fancy one-liner comprehension list with multiple inline conditions and walrus operators might not be a very clear solution for the person asking the question to understand and improve. "Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts."
Yep, point taken. While I did try to split it up into multiple lines for readability so no line is too long, in hindsight using list comprehensions and walrus operators to solve the problem as quickly as possible - might not have been the clearest solution overall.
@ted I did update my answer to provide an alternate solution that works for <3.8 with hopefully some improved comments about what's going on
@rv.kvetch, when executing your code I'm getting the following: 'TypeError: string indices must be integers' for line --> year = d['recordedDate'][:4] ...in this case, d is retuning just the node names id, outter, inner, topic, date1, finalDate ... what could be missing?
@rv.kvetch, yes, it was a json string. I used json.load as you recommended and works as expected. Thanks!!!
|
0

Try something like this one.. Change filters per your need :)

for item in data:
    if item['finalDate'].startswith("1995"):
        for inner in item['outter']['inner']:
            if inner['code'] == '9351':
                print(item)

1 Comment

This will only work for finalDate with the year 1995 :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.