3

What is the idiomatic Pandas way to expand a column containing a JSON encoded array of observations into additional rows?

In the example below Out[3] is a DataFrame containing loan data. There is one row per loan. Columns Loan ID, Start Date, End Date, and Amount do not vary over the life of the loan. Zero or more date-stamped payments are encoded into the Payments column as a JSON (string) array.

Jupyter screenshot

The target output in Out[5] shows the goal. One or more rows per original row, with each payment from Payments resulting in the creation of a new row in the output.

I've done this two ways: with iterrows, which is sane-looking and easy to read, and with a convoluted, somewhat handwavy approach where I pull the fixed attributes into the index to preserve them, then melt and re-index.

There must be a better way! Please share the secrets of the pandas masters :)

1 Answer 1

5

First remove NaNs in column Payments by dropna and convert jsons to dicts by ast.literal_eval:

import ast

s = df['Payments'].dropna().apply(ast.literal_eval)
print (s)
0    [{'Payment Amount': 1000, 'Payment Date': '201...
Name: Payments, dtype: object

Then convert each value to DataFrame in list comprehension and concat together - keys parameter is important for align to original rows:

df1 = pd.concat([pd.DataFrame(x) for x in s], keys=s.index)
print (df1)
     Payment Amount Payment Date
0 0            1000   2018-03-11
  1            2000   2018-03-13
  2            3000   2018-03-15

Remove column and join to original DataFrame, last for unique index add reset_index:

df = df.drop('Payments', 1).join(df1.reset_index(level=1, drop=True)).reset_index(drop=True)
df['Payment Date'] = pd.to_datetime(df['Payment Date'])
print (df)
   LoanId  Start Date    End Date  Amount  Payment Amount Payment Date
0     100  2018-01-01  2021-01-01   10000          1000.0   2018-03-11
1     100  2018-01-01  2021-01-01   10000          2000.0   2018-03-13
2     100  2018-01-01  2021-01-01   10000          3000.0   2018-03-15
3     101  2018-01-02  2021-01-02   20000             NaN          NaT
4     102  2018-01-03  2021-01-03   30000             NaN          NaT
Sign up to request clarification or add additional context in comments.

1 Comment

Much appreciated. The concat step is the magic I was looking for, in particular the keys=s.index part.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.