0

I have a json file as below:

[
{
    "contributors": null,
    "coordinates": null,
    "created_at": "Fri Aug 04 21:12:59 +0000 2017",
    "entities": {
        "hashtags": [
            {
                "indices": [
                    32,
                    39
                ],
                "text": "\ubd80\uc0b0\ucd9c\uc7a5\uc548\ub9c8"
            },
            {
                "indices": [
                    40,
                    48
                ],
                "text": "\ubd80\uc0b0\ucd9c\uc7a5\ub9c8\uc0ac\uc9c0"
            }
        ]
    },
    "text": "\uaedb"
    "retweeted_status": {
        "contributors": null,
        "coordinates": null,
        "created_at": "Fri Aug 04 20:30:06 +0000 2017",
        "display_text_range": [
            0,
            0
        ],
        "text": "hjhfbsdjsdbjsd"
    },
    "extended_tweet": {
            "display_text_range": [
                0,
                137
            ],
            "entities": {
                "hashtags": [
                    {
                        "indices": [
                            62,
                            75
                        ],
                        "text": "2ndAmendment"
                    },
                    {
                        "indices": [
                            91,
                            104
                        ],
                        "text": "1stAmendment"
                    }
                ]
            }
    }
}
]

I wrote the below python code to count the number of text attributes throughout the json file.

data = json.load(data_file)
for key, value in data1.items():
    if key=="text":
        cnt+=1
    elif key=="retweeted_status":
        for k,v in value.items():
            if k=="text":
                cnt+=1  
    elif key == "entities":
        if key.keys()=="hashtags" :
            for k1,v1 in key:
# Difficult to loop further

Since the data structure doesn't remain constant it becomes difficult to iterate. Further I want to access the value of the text attribute and display it. Is there any simpler way to do this without multiple loops?

3
  • if key.keys()=="hashtags" will never be True, btw Commented Aug 7, 2017 at 22:06
  • Why are you looping over the items if you have specific keys? Do something like: if 'text' in key: cnt += 1 if 'text' in data.get('retweeted_status', {}): cnt += 1, etc. No loop necessary. Commented Aug 7, 2017 at 22:09
  • @Artyer wouldn't that be difficult in case of accessing text from extended_tweet values? Commented Aug 7, 2017 at 22:17

2 Answers 2

1

What about using regular expressions?:

import re
regex_chain = re.compile(r'(text)\": \"(.*)\"')

text_ocurrences=[]
with open('1.json') as file:
    for line in file:
        match = regex_chain.search(line)
        if match:
            text_ocurrences.append({ match.group(1) : match.group(2)})
print(text_ocurrences)

You get a list of dicts in which each contains key, value of text occurrences

[{'text': '\\ubd80\\uc0b0\\ucd9c\\uc7a5\\uc548\\ub9c8'}, {'text': '\\ubd80\\uc0b0\\ucd9c\\uc7a5\\ub9c8\\uc0ac\\uc9c0'}, {'text': '\\uaedb'}, {'text': 'hjhfbsdjsdbjsd'}, {'text': '2ndAmendment'}, {'text': '1stAmendment'}]
Sign up to request clarification or add additional context in comments.

Comments

0

I'm not sure how safe it is to naively parse JSON with regular expression, especially with (text)\": \"(.*)\" which could technically match text": "abc", "text": "another" with group 1 being text and group 2 being abc", "text": "another.

It's much safer to parse JSON with python's standard json library, then traverse that data recursively.

import json

def count_key(selected_key, obj):

    count = 0

    if isinstance(obj, list):
        for item in obj:
            count += count_key(selected_key, item)

    elif isinstance(obj, dict):
        for key in obj:

            if key == selected_key:
                count += 1

            count += count_key(selected_key, obj[key])

    return count


with open("my-json-file", "r") as json_file:
    print(count_key("text", json.loads(json_file.read())))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.