1

I have a dataframe which looks like this:

                      key         text
0                    title  Lorem ipsum
1                   header  Lorem ipsum
2              description  Lorem ipsum
.
.
.
.
10            pyramid.male  Lorem ipsum
11    pyramid.male_surplus  Lorem ipsum
12          pyramid.female  Lorem ipsum
13  pyramid.female_surplus  Lorem ipsum
.
.
.
.
29    jitterplot.title1          Lorem ipsum
30    jitterplot.metric_1.label  Lorem ipsum
31  jitterplot.metric_1.tooltip  Lorem ipsum
32    jitterplot.metric_2.label  Lorem ipsum
33  jitterplot.metric_2.tooltip  Lorem ipsum

The keys represent keys in a JSON file. The JSON structure should look like the following:

{
  "title": "Lorem ipsum",
  "header": "Lorem ipsum",
  "description": "Lorem ipsum",

  "pyramid": {
    "male": "Lorem ipsum",
    "male_surplus": "Lorem ipsum",
    "female": "Lorem ipsum",
    "female_surplus": "Lorem ipsum"
  },

  "jitterplot": {
    "title1": "Lorem ipsum",
    "metric_1": {
      "label": "Lorem ipsum",
      "tooltip": "Lorem ipsum"
    },
    "metric_2": {
      "label": "Lorem ipsum",
      "tooltip": "Lorem ipsum"
    }
  }
}

Meaning, a . in the key column represents a nested level.

Is there a 'Pythonic' way to achieve this? Currently, I'm just hacking it by manually writing each row to a text file with a custom parser I wrote. But obviously this is not very scalable.

I've prepared a sample CSV which you can read, and added some additional columns if they help. Use the following code:

import pandas as pd

url = 'https://raw.githubusercontent.com/Thevesh/Display/master/i18n_sample.csv'
df = pd.read_csv(url)

df['n_levels'] = df['key'].str.count('\.') # column with number of levels
max_levels = df.n_levels.max() # 
df = df.join(df['key'].str.split('.',expand=True))
df.columns = list(df.columns)[:-max_levels-1] + ['key_' + str(x) for x in range(max_levels+1)] 

3 Answers 3

1

Similarly but a bit simpler than the other answers:

def set_nested_value(d, keys, value):
    for key in keys[:-1]:
        d = d.setdefault(key, {})
    d[keys[-1]] = value
    
result = {}
for _, row in df.iterrows():
    set_nested_value(result, row["key"].split("."), row["text"])
Sign up to request clarification or add additional context in comments.

1 Comment

Accepted this as the answer for its parsimony - thanks for the lesson!
1

This seems like a good fit for a recursive function:

# Dataframe with columns key and value:
df = ...
json_data = {}

def set_value(nested_dict, keys, value):
    if len(keys) == 1:
        nested_dict[keys[0]] = value
        return
    if keys[0] not in nested_dict:
        nested_dict[keys[0]] = {}
    set_value(nested_dict[keys[0]], keys[1:], value)

for full_key, value in zip(df.key, df.text):
    keys = full_key.split('.')
    set_value(json_data, keys, value)

print(json_data)

Comments

1
def autonesting_dict():
    return collections.defaultdict(autonesting_dict)

json_dict = autonesting_dict()

key, value = 'jitterplot.metric_2.tooltip', "Lorem ipsum"
subkeys = key.split('.')

nested_dict = functools.reduce(lambda d, key: d[key], subkeys[:-1], json_dict)
nested_dict[subkeys[-1]] = value

The above will make it so that:

json_dict['jitterplot']['metric_2']['tooltip']  # 'Lorem ipsum'

Just repeat for all rows.


Sidenote regarding:

I've prepared a sample CSV which you can read, and added some additional columns if they help. Use the following code:

Maybe it's just me, but that sounds like something that might be given on an assignment or quiz, not like someone asking for assistance.

1 Comment

I graduated university a long time ago - my specific use case is translating i18n JSON files into something clients can work with in Excel (tabular), and then working that Excel back into JSON.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.