5

I have exhaustively reviewed/attempted implementations all the other questions on SO corresponding to this challenge and have yet to reach a solution.

Question: how do I convert employee and supervisor pairs into a hierarchical JSON structure to be used for a D3 visualization? There are an unknown number of levels, so it has to be dynamic.

I have a dataframe with five columns (yes, I realize this isn't the actual hierarchy of The Office):

  Employee_FN Employee_LN Supervisor_FN Supervisor_LN  Level
0     Michael       Scott          None          None      0
1         Jim     Halpert       Michael         Scott      1
2      Dwight     Schrute       Michael         Scott      1
3     Stanley      Hudson           Jim       Halpert      2
4         Pam     Beasley           Jim       Halpert      2
5        Ryan      Howard           Pam       Beasley      3
6       Kelly      Kapoor          Ryan        Howard      4
7    Meredith      Palmer          Ryan        Howard      4 

Desired Output Snapshot:

{
  "Employee_FN": "Michael",
  "Employee_LN": "Scott",
  "Level": "0",
  "Reports": [{
        "Employee_FN": "Jim",
        "Employee_LN": "Halpert",
        "Level": "1",
        "Reports": [{
              "Employee_FN": "Stanley",
              "Employee_LN": "Hudson",
              "Level": "2",
            }, {
              "Employee_FN": "Pam",
              "Employee_LN": "Beasley",
              "Level": "2",
            }]
        }]
}

Current State:

j = (df.groupby(['Level','Employee_FN','Employee_LN'], as_index=False)
             .apply(lambda x: x[['Level','Employee_FN','Employee_LN']].to_dict('r'))
             .reset_index()
             .rename(columns={0:'Reports'})
             .to_json(orient='records'))

print(json.dumps(json.loads(j), indent=2, sort_keys=True))

Current Output:

[
  {
    "Employee_FN": "Michael",
    "Employee_LN": "Scott",
    "Level": 0,
    "Reports": [
      {
        "Employee_FN": "Michael",
        "Employee_LN": "Scott",
        "Level": 0
      }
    ]
  },
  {
    "Employee_FN": "Dwight",
    "Employee_LN": "Schrute",
    "Level": 1,
    "Reports": [
      {
        "Employee_FN": "Dwight",
        "Employee_LN": "Schrute",
        "Level": 1
      }
    ]
  },
  {
    "Employee_FN": "Jim",
    "Employee_LN": "Halpert",
    "Level": 1,
    "Reports": [
      {
        "Employee_FN": "Jim",
        "Employee_LN": "Halpert",
        "Level": 1
      }
    ]
  },
  {
    "Employee_FN": "Pam",
    "Employee_LN": "Beasley",
    "Level": 2,
    "Reports": [
      {
        "Employee_FN": "Pam",
        "Employee_LN": "Beasley",
        "Level": 2
      }
    ]
  },
  {
    "Employee_FN": "Stanley",
    "Employee_LN": "Hudson",
    "Level": 2,
    "Reports": [
      {
        "Employee_FN": "Stanley",
        "Employee_LN": "Hudson",
        "Level": 2
      }
    ]
  },
  {
    "Employee_FN": "Ryan",
    "Employee_LN": "Howard",
    "Level": 3,
    "Reports": [
      {
        "Employee_FN": "Ryan",
        "Employee_LN": "Howard",
        "Level": 3
      }
    ]
  },
  {
    "Employee_FN": "Kelly",
    "Employee_LN": "Kapoor",
    "Level": 4,
    "Reports": [
      {
        "Employee_FN": "Kelly",
        "Employee_LN": "Kapoor",
        "Level": 4
      }
    ]
  },
  {
    "Employee_FN": "Meredith",
    "Employee_LN": "Palmer",
    "Level": 4,
    "Reports": [
      {
        "Employee_FN": "Meredith",
        "Employee_LN": "Palmer",
        "Level": 4
      }
    ]
  }
]

Problems:

  1. Each person only has themselves as children
  2. The whole JSON structure appears to be in a dict - I believe it has to be enclosed by {} to be readable

I have tried switched around the groupby and lambda elements in various configurations to reach the desired output as well. Any and all insight would be greatly appreciated! Thank you!

Update:

I changed my code block to this:

j = (df.groupby(['Level','Supervisor_FN','Supervisor_LN'], as_index=False)
             .apply(lambda x: x[['Level','Employee_FN','Employee_LN']].to_dict('r'))
             .reset_index()
             .rename(columns={0:'Reports'})
             .rename(columns={'Supervisor_FN':'Employee_FN'})
             .rename(columns={'Supervisor_LN':'Employee_LN'})
             .to_json(orient='records'))

print(json.dumps(json.loads(j), indent=2, sort_keys=True))

The new output is this:

[
  {
    "Employee_FN": "Michael",
    "Employee_LN": "Scott",
    "Level": 1,
    "Reports": [
      {
        "Employee_FN": "Jim",
        "Employee_LN": "Halpert",
        "Level": 1
      },
      {
        "Employee_FN": "Dwight",
        "Employee_LN": "Schrute",
        "Level": 1
      }
    ]
  },
  {
    "Employee_FN": "Jim",
    "Employee_LN": "Halpert",
    "Level": 2,
    "Reports": [
      {
        "Employee_FN": "Stanley",
        "Employee_LN": "Hudson",
        "Level": 2
      },
      {
        "Employee_FN": "Pam",
        "Employee_LN": "Beasley",
        "Level": 2
      }
    ]
  },
  {
    "Employee_FN": "Pam",
    "Employee_LN": "Beasley",
    "Level": 3,
    "Reports": [
      {
        "Employee_FN": "Ryan",
        "Employee_LN": "Howard",
        "Level": 3
      }
    ]
  },
  {
    "Employee_FN": "Ryan",
    "Employee_LN": "Howard",
    "Level": 4,
    "Reports": [
      {
        "Employee_FN": "Kelly",
        "Employee_LN": "Kapoor",
        "Level": 4
      },
      {
        "Employee_FN": "Meredith",
        "Employee_LN": "Palmer",
        "Level": 4
      }
    ]
  }
]

Problems:

  1. The Level matches the underlying employee for both the underlying employee and the supervisor
  2. The nesting only goes one level deep
2
  • For 1, just adding a 'Sup_level' column, with df['Sup_level'] = df['Level']-1, and adding appropriately to the 'rename' bit with .rename(columns={0:'Reports', 'Sup_level':'Level', 'Supervisor_FN':'Employee_FN','Supervisor_LN':'Employee_LN'}) should work. Commented May 18, 2017 at 17:20
  • Thank you very much - that allows the levels to match appropriately. The issue of the resulting JSON only going one level deep still remains. Commented May 18, 2017 at 17:24

1 Answer 1

4

This type of problem isn't particularly well-suited for Pandas; the data structure you're going after is recursive, not tabular.

Here is one possible solution.

from operator import itemgetter

employee_key = itemgetter('Employee_FN', 'Employee_LN')
supervisor_key = itemgetter('Supervisor_FN', 'Supervisor_LN')

def subset(dict_, keys):
    return {k: dict_[k] for k in keys}

# store employee references
cache = {}

# iterate over employees sorted by level, so supervisors are cached before reports
for row in df.sort_values('Level').to_dict('records'):

    # look up employee/supervisor references
    employee = cache.setdefault(employee_key(row), subset(row, keys=('Employee_FN', 'Employee_LN', 'Level')))
    supervisor = cache.get(supervisor_key(row), {})

    # link reports to employee
    supervisor.setdefault('Reports', []).append(employee)

# grab only top-level employees
[rec for key, rec in cache.iteritems() if rec['Level'] == 0]
[{'Employee_FN': 'Michael',
  'Employee_LN': 'Scott',
  'Level': 0,
  'Reports': [{'Employee_FN': 'Jim',
    'Employee_LN': 'Halpert',
    'Level': 1,
    'Reports': [{'Employee_FN': 'Stanley',
      'Employee_LN': 'Hudson',
      'Level': 2},
     {'Employee_FN': 'Pam',
      'Employee_LN': 'Beasley',
      'Level': 2,
      'Reports': [{'Employee_FN': 'Ryan',
        'Employee_LN': 'Howard',
        'Level': 3,
        'Reports': [{'Employee_FN': 'Kelly',
          'Employee_LN': 'Kapoor',
          'Level': 4},
         {'Employee_FN': 'Meredith',
          'Employee_LN': 'Palmer',
          'Level': 4}]}]}]},
   {'Employee_FN': 'Dwight', 'Employee_LN': 'Schrute', 'Level': 1}]}]
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. When I run your edited code, it returns AttributeError: 'dict' object has no attribute 'iteritems' for the last line. Also, how do I incorporate that into actually writing to the JSON format? Forgive me - I am quite new to all of this.
@Cameron you must be using Python 3.x? Change iteritems() to items(), and it should work for you. The last line produces a serializable list. You can pass it directly to json.dumps(...) to produce a JSON string.
It works perfectly, of course - thank you very much! If you happen to know of any good Python + D3 resources, I would be all ears! Thanks, again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.