0

I have a set of JSON files that contains some information.The below data is value for key 'BrowserInfo'.I want to extract the following information Title , Links, Browser,Platform,CPUs from what is given below, add the above fields as keys in the JSON file and extract their values and assign to those keys.

Title: Worlds best websit | mywebsite.com
Links: 225
Browser: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Ubuntu Chromium/41.0.2272.76 Chrome/41.0.2272.76 Safari/537.36
Platform: Linux x86_64
CPUs: 8

I have writtten a python program to descent into the directory and extract 'BrowserInfo' value from the JSON files.

# Set the directory you want to start from
rootDir = '/home/space'
for dirName, subdirList, fileList in os.walk(rootDir):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        fname='space/'+fname
        with open(fname, 'r+') as f:
            json_data = json.load(f)
            BrowserInfo = json_data['BrowserInfo']
            print(BrowserInfo)

How do I extract the values and add new key-value pairs to JSON files using Python.

4
  • What's wrong with the code you have? Commented Mar 28, 2015 at 1:01
  • @KSFT - I want to extract induvidual information from the BrowserInfo and add them as new key value pairs to the JSON files. Commented Mar 28, 2015 at 1:02
  • 1
    What's wrong with something like json_data[BrowserInfo.key]=BrowserInfo.value? Commented Mar 28, 2015 at 1:08
  • @KSFT - I suspect you misunderstood the question.The entiere blob of data given above is the value for the key BrowserInfo. They are not in JSON format. They need to be parsed I think. Commented Mar 28, 2015 at 1:18

2 Answers 2

1

Assuming, (and this seems like a big assumption), that BrowserInfo contains newline-separated key, value pairs, separated by ': ', you could extract the keys / values with:

for line in BrowserInfo.splitlines():
    k,v = line.split(': ', 1)

Then just insert them wherever you want in the dictionary, e.g.:

json_data['BrowserInfo'] = {}
for line in BrowserInfo.splitlines():
    k,v = line.split(': ', 1)
    json_data['BrowserInfo'][k] = v
Sign up to request clarification or add additional context in comments.

11 Comments

I read the file using json.load() will assigning json_data[] write it back to the file?
@liv2hak No, unfortunately you'd have to re-write the object to file with something like json.dump.
when I use split lines how do I discard a single line? for example the first line?
@liv2hak you could add a test like if len(line) < 3: continue as the first line inside your for loop -- 3 seems reasonable.
You can try, it'd depend on what was actually being used -- to do that you'd try for line in BrowserInfo.split('<br>'):
|
0

A quick demo for the parsing

>>>import re, itertools

>>> BrowserInfo
'Title: Worlds best websit | mywebsite.com\nLinks: 225\nBrowser: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Ubuntu Chromium/41.0.2272.76 Chrome/41.0.2272.76 Safari/537.36\nPlatform: Linux x86_64\nCPUs: 8'

>>> re.split(':|\n', BrowserInfo)
['Title', ' Worlds best websit | mywebsite.com', 'Links', ' 225', 'Browser', ' Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Ubuntu Chromium/41.0.2272.76 Chrome/41.0.2272.76 Safari/537.36', 'Platform', ' Linux x86_64', 'CPUs', ' 8']

>>> s = re.split(':|\n', BrowserInfo)
>>> {pair[0].strip():pair[1].strip() for pair in itertools.izip(s[::2], s[1::2])}
{'Platform': 'Linux x86_64', 'Browser': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Ubuntu Chromium/41.0.2272.76 Chrome/41.0.2272.76 Safari/537.36', 'CPUs': '8', 'Links': '225', 'Title': 'Worlds best websit | mywebsite.com'}

Thus

json_data['BrowserInfo'] = {pair[0].strip():pair[1].strip() for pair in itertools.izip(s[::2], s[1::2])}

would be your json

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.