0

The data under consideration is coming from an API, which means that it's highly inconsistent- sometimes it pulls unexpected content, sometimes it pulls nothing, etc.

What I'm interested in is the data associated with ISO 3166-2 for each record.

The data (when it doesn't encounter an error) generally looks something like this:

{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "RO", "adminCode1": "10", "countryName": "Romania", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "10"}, {"type": "ISO3166-2", "code": "B"}], "adminName1": "Bucure\u015fti"}
{"countryCode": "DE", "adminCode1": "07", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "07"}, {"type": "ISO3166-2", "code": "NW"}], "adminName1": "North Rhine-Westphalia"}
{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}
{"countryCode": "DE", "adminCode1": "02", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "02"}, {"type": "ISO3166-2", "code": "BY"}], "adminName1": "Bavaria"}

Let's take one record for example:

{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}

From this I'm interested to extract the ISO 3166-2 representation, i.e. DE-BW.

I've been trying different ways of extracting this information with python, one attempt looked like this:

coord = response.get('codes', {}).get('type', {}).get('ISO3166-2', None)

another attempt looked like this:

print(json.dumps(response["codes"]["ISO3166-2"]))

However neither of those methods worked.

How can I take a record such as:

{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}

and extract only DE-BW using python, while simultaneously controlling for instances that don't look exactly like that, for instance also extracting GB-ENG from:

{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}

and of course not crashing if it gets something that doesn't look like either of those, i.e. exception handling.


FULL FILE

import json
import requests
from collections import defaultdict
from pprint import pprint

# open up the output of 'data-processing.py'
with open('job-numbers-by-location.txt') as data_file:

    for line in data_file:
        identifier, name, coords, number_of_jobs = line.split("|")
        coords = coords[1:-1]
        lat, lng = coords.split(",")
        # print("lat: " + lat, "lng: " + lng)
        response = requests.get("http://api.geonames.org/countrySubdivisionJSON?lat="+lat+"&lng="+lng+"&username=s.matthew.english").json()


        codes = response.get('codes', [])
        for code in codes:
            if code.get('type') == 'ISO3166-2':
                print('{}-{}'.format(response.get('countryCode', 'UNKNOWN'), code.get('code', 'UNKNOWN'))

1 Answer 1

1

'ISO3166-2' is dictionary value, not key

codes = response.get('codes', [])
for code in codes:
    if code.get('type') == 'ISO3166-2':
        print('{}-{}'.format(response.get('countryCode', 'UNKNOWN'), code.get('code', 'UNKNOWN')))
Sign up to request clarification or add additional context in comments.

4 Comments

some of the records look like this {"countryCode": "BE", "adminCode1": "VLG", "countryName": "Belgium", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "13"}, {"type": "ISO3166-2", "code": "VLG"}], "adminName1": "Flanders"}, and it seems to also be grabbing that "FIPS10-4" data, is there some modification that can be made to ignore those codes?
@s.matthew.english, I just tried with this record and it gives me BE-VLG
yeah true- it was my bad- I was putting it as part of a longer file. but do you have an idea about that "FIPS10-4"
For that particular record you provided there is no such problem, when I run the code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.