Iterate through json file to get specific attribute values using python

Question

I have a json file as below:

[
{
    "contributors": null,
    "coordinates": null,
    "created_at": "Fri Aug 04 21:12:59 +0000 2017",
    "entities": {
        "hashtags": [
            {
                "indices": [
                    32,
                    39
                ],
                "text": "\ubd80\uc0b0\ucd9c\uc7a5\uc548\ub9c8"
            },
            {
                "indices": [
                    40,
                    48
                ],
                "text": "\ubd80\uc0b0\ucd9c\uc7a5\ub9c8\uc0ac\uc9c0"
            }
        ]
    },
    "text": "\uaedb"
    "retweeted_status": {
        "contributors": null,
        "coordinates": null,
        "created_at": "Fri Aug 04 20:30:06 +0000 2017",
        "display_text_range": [
            0,
            0
        ],
        "text": "hjhfbsdjsdbjsd"
    },
    "extended_tweet": {
            "display_text_range": [
                0,
                137
            ],
            "entities": {
                "hashtags": [
                    {
                        "indices": [
                            62,
                            75
                        ],
                        "text": "2ndAmendment"
                    },
                    {
                        "indices": [
                            91,
                            104
                        ],
                        "text": "1stAmendment"
                    }
                ]
            }
    }
}
]

I wrote the below python code to count the number of text attributes throughout the json file.

data = json.load(data_file)
for key, value in data1.items():
    if key=="text":
        cnt+=1
    elif key=="retweeted_status":
        for k,v in value.items():
            if k=="text":
                cnt+=1  
    elif key == "entities":
        if key.keys()=="hashtags" :
            for k1,v1 in key:
# Difficult to loop further

Since the data structure doesn't remain constant it becomes difficult to iterate. Further I want to access the value of the text attribute and display it. Is there any simpler way to do this without multiple loops?

Why are you looping over the items if you have specific keys? Do something like: if 'text' in key: cnt += 1 if 'text' in data.get('retweeted_status', {}): cnt += 1, etc. No loop necessary. — Artyer
– Artyer, Commented Aug 7, 2017 at 22:09
@Artyer wouldn't that be difficult in case of accessing text from extended_tweet values? — PS_92
– PS_92, Commented Aug 7, 2017 at 22:17

alvarez · Accepted Answer · 2017-08-08 10:37:14Z

1

What about using regular expressions?:

import re
regex_chain = re.compile(r'(text)\": \"(.*)\"')

text_ocurrences=[]
with open('1.json') as file:
    for line in file:
        match = regex_chain.search(line)
        if match:
            text_ocurrences.append({ match.group(1) : match.group(2)})
print(text_ocurrences)

You get a list of dicts in which each contains key, value of text occurrences

[{'text': '\\ubd80\\uc0b0\\ucd9c\\uc7a5\\uc548\\ub9c8'}, {'text': '\\ubd80\\uc0b0\\ucd9c\\uc7a5\\ub9c8\\uc0ac\\uc9c0'}, {'text': '\\uaedb'}, {'text': 'hjhfbsdjsdbjsd'}, {'text': '2ndAmendment'}, {'text': '1stAmendment'}]

answered Aug 8, 2017 at 10:37

alvarez

4863 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

darksky · Accepted Answer · 2017-08-08 22:41:13Z

I'm not sure how safe it is to naively parse JSON with regular expression, especially with (text)\": \"(.*)\" which could technically match text": "abc", "text": "another" with group 1 being text and group 2 being abc", "text": "another.

It's much safer to parse JSON with python's standard json library, then traverse that data recursively.

import json

def count_key(selected_key, obj):

    count = 0

    if isinstance(obj, list):
        for item in obj:
            count += count_key(selected_key, item)

    elif isinstance(obj, dict):
        for key in obj:

            if key == selected_key:
                count += 1

            count += count_key(selected_key, obj[key])

    return count


with open("my-json-file", "r") as json_file:
    print(count_key("text", json.loads(json_file.read())))

Collectives™ on Stack Overflow

Iterate through json file to get specific attribute values using python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related