Python : nested key value data parsing

Question

I am trying to create a python script which can parse the following type of log entry which comprises of keys and values. For each key, there may or may not be another nested pair of keys and values. An example is as below. THe depth of the nesting can vary depeding on the log i get so it has to be dynamic. THe depth is however encapsulated with braces.

The string I will have with keys and values are something like this:

   Countries =     {
    "USA" = 0;
    "Spain" = 0;
    Connections = 1;
    Flights =         {
        "KLM" = 11;
        "Air America" = 15;
        "Emirates" = 2;
        "Delta" = 3;
    };
    "Belgium" = 1;
    "Czech Republic" = 0;
    "Netherlands" = 1;
    "Hungary" = 0;
    "Luxembourg" = 0;
    "Italy" = 0;

};

THe data above can have multiple nests as well. I would like to write a function that will parse through this and put it in an array of data (or similar) such that I could get a the value of a specific key like:

    print countries.belgium
          value should be printed as 1

likewise,

    print countries.flights.delta
          value should be printed as 3.

Note that the input doesnt need to have quotes in all the keys (like connections or flights).

Any pointers to what I can start with. Any python libraries that can already do some parsing like this?

pokeymond · Accepted Answer · 2016-03-01 05:05:59Z

1

I have created a sample python script that will do the job, just tweak it as your like. It converts you format into a nested dict. And it is as dynamic as you like.

Take a look at here: Paste bin Code:

import re
import ast

data = """ { Countries = { USA = 1; "Connections" = { "1 Flights" = 0; "10 Flights" = 0; "11 Flights" = 0; "12 Flights" = 0; "13 Flights" = 0; "14 Flights" = 0; "15 Flights" = 0; "16 Flights" = 0; "17 Flights" = 0; "18 Flights" = 0; "More than 25 Flights" = 0; }; "Single Connections" = 0; "No Connections" = 0; "Delayed" = 0; "Technical Fault" = 0; "Others" = 0; }; }"""


def arrify(string):
    string = string.replace("=", " : ")
    string = string.replace(";", " , ")
    string = string.replace("\"", "")
    stringDict = string.split()
    # print stringDict
    newArr = []
    quoteCosed = True
    for i, splitStr in enumerate(stringDict):
        if i > 0:
            # print newArr
            if not isDelim(splitStr):
                if isDelim(newArr[i-1]) and quoteCosed:
                    splitStr = "\"" + splitStr
                    quoteCosed = False

                if isDelim(stringDict[i+1]) and not quoteCosed:
                    splitStr += "\""
                    quoteCosed = True

        newArr.append(splitStr)   

    newString = " ".join(newArr)
    newDict = ast.literal_eval(newString)
    return normalizeDict(newDict)

def isDelim(string):
    return str(string) in "{:,}"


def normalizeDict(dic):
    for key, value in dic.items():
        if type(value) is dict:
            dic[key] = normalizeDict(value)
            continue
        dic[key] = normalize(value)
    return dic

def normalize(string):
    try:
        return int(string)
    except:
        return string

print arrify(data)

The result from your sample data:

{'Countries': {'USA': 1, 'Technical Fault': 0, 'No Connections': 0, 'Delayed': 0, 'Connections': {'17 Flights': 0, '10 Flights': 0, '11 Flights': 0, 'More than 25 Flights': 0, '14 Flights': 0, '15 Flights': 0, '12 Flights': 0, '18 Flights': 0, '16 Flights': 0, '1 Flights': 0, '13 Flights': 0}, 'Single Connections': 0, 'Others': 0}}

And you can get values like a normal dict would :) hope it helps ...

edited Mar 1, 2016 at 5:05

answered Feb 29, 2016 at 10:14

pokeymond

6817 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Blckknght Over a year ago

You really need to include the code in your answer. Just linking to it is not good enough.

sidman Over a year ago

@richmondwang, exactly what I was looking for. However, my dynamic string this time is as below, and this gave me a syntax error:

pokeymond Over a year ago

What data did you pass? @user2605278

pokeymond Over a year ago

Ahh. its because of the preceeding numeric value of the keys. I'll modify it.

pokeymond Over a year ago

just enclose your data with { data_string } so you dont get parsing error :)

|

Himanshu · Accepted Answer · 2016-02-29 09:25:46Z

1

Iterate over the data and check if the element is another key-value pair, If it is, then call the function recursively. Something like this:

def parseNestedData(data):
    if isinstance(data, dict):
        for k in data.keys():
            parseNestedData(data.get(k))
    else:
        print data

Output:

>>> Countries =     {
"USA" : 0,
"Spain" : 0,
"Connections" : 1,
"Flights" :         {
    "KLM" : 11,
    "Air America" : 15,
    "Emirates" : 2,
    "Delta" : 3,
},
"Belgium" : 1,
"Czech Republic" : 0,
"Netherlands" : 1,
"Hungary" : 0,
"Luxembourg" : 0,
"Italy" :0
};

>>> Countries
{'Connections': 1,
'Flights': {'KLM': 11, 'Air America': 15, 'Emirates': 2, 'Delta': 3},
 'Netherlands': 1,
'Italy': 0,
'Czech Republic': 0,
'USA': 0,
'Belgium': 1,
'Hungary': 0,
'Luxembourg': 0, 'Spain': 0}
>>> parseNestedData(Countries)
1
11
15
2
3
1
0
0
0
1
0
0
0

answered Feb 29, 2016 at 9:25

Himanshu

1,0029 silver badges22 bronze badges

4 Comments

sidman Over a year ago

Thanks Himanshu. How can I get just the value of say Czech Republic (should return me just 0)

sidman Over a year ago

also this needs some pre-processing? Because not all keys are enclosed with double quotes, for example - Connections

Himanshu Over a year ago

If you know that Czech Republic key is present at the first level, then just do data.get('Czech Republic')

Himanshu Over a year ago

Any key present in data should be immutable, i.e, it can be of type string, integer or tuple. Just Connections is invalid, that is why I have edited the question.

Colin Dickie · Accepted Answer · 2016-02-29 15:30:56Z

Defining a Class structure to process and store the information, could give you something like this:

import re

class datastruct():
    def __init__(self,data_in):
        flights = re.findall('(?:Flights\s=\s*\{)([\s"A-Z=0-9;a-z]*)};',data_in)
        flight_dict = {}
        for flight in flights[0].split(';')[0:-1]:
            key,val = self.split_data(flight)
            flight_dict[key] = val

        countries = re.findall('("[A-Za-z]+\s?[A-Za-z]*"\s=\s[0-9]{1,2})',data_in)
        countries_dict = {}
        for country in countries:
            key,val = self.split_data(country)
            if key not in flight_dict:
                countries_dict[key]=val

        connections = re.findall('(?:Connections\s=\s)([0-9]*);',data_in)
        self.country= countries_dict
        self.flight = flight_dict
        self.connections = int(connections[0])

    def split_data(self,data2):
        item = data2.split('=')
        key = item[0].strip().strip('"')
        val = int(item[1].strip())
        return key,val

Please note the Regex may need tweaking if the data is not exactly as I've assumed below. The data could be set-up and referenced as follows:

raw_data = 'Countries =     {    "USA" = 0;    "Spain" = 0;    Connections = 1;    Flights =         {        "KLM" = 11;        "Air America" = 15;        "Emirates" = 2;        "Delta" = 3;    };    "Belgium" = 1;    "Czech Republic" = 0;    "Netherlands" = 1;    "Hungary" = 0;    "Luxembourg" = 0;    "Italy" = 0;};'

flight_data = datastruct(raw_data)
print("No. Connections:",flight_data.connections)
print("Country 'USA':",flight_data.country['USA'],'\n'
print("Flight 'KLM':",flight_data.flight['KLM'],'\n')

for country in flight_data.country.keys():
    print("Country: {0} -> {1}".format(country,flight_data.country[country]))

Collectives™ on Stack Overflow

Python : nested key value data parsing

3 Answers 3

10 Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

10 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related