0

I'm having a hard time splitting a data frame. I am hoping to get some help. I'm trying to split the original data into a data-frame for each city indexed in the top row and the date in the first column. For my actual data I have 189 unique cities

Original data:

Original data

This is my goal:

goal

I've tried a number of different ways but my index's are still in the first two columns.

1

2 Answers 2

3

This can be done using df.pivot(), df.reorder_levels() and df.sort_index().

  • df.pivot(): transpose the table into hierarchical columns
    • axis=1 refers to columns while axis=0 refers to rows.
  • df.reorder_levels(): move City up and Vals down
  • df.sort_index(): sort the rows and columns using default or customized ordering (e.g. sort as datetime rather than str).

Code:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    data={  # please provide sample data next time
        "City": ["NYC"]*5 + ["LA"]*5 + ["OKC"]*5,
        "Date": ["6/1/1998", "7/1/1998", "8/1/1998", "9/1/1998", "10/1/1998"]*3,
        "Val1": np.array(range(15))*10,
        "Val2": np.array(range(15))/10,
        "Val3": np.array(range(15)),
    }
)

df_out = df.pivot(index="Date", columns=["City"], values=["Val1", "Val2", "Val3"])\
    .reorder_levels([1, 0], axis=1)\
    .sort_index(axis=1)\
    .sort_index(axis=0, key=lambda s: pd.to_datetime(s))

Output:

In[27]: df_out
Out[27]: 

City         LA             NYC              OKC           
           Val1 Val2 Val3  Val1 Val2 Val3   Val1 Val2  Val3
Date                                                       
6/1/1998   50.0  0.5  5.0   0.0  0.0  0.0  100.0  1.0  10.0
7/1/1998   60.0  0.6  6.0  10.0  0.1  1.0  110.0  1.1  11.0
8/1/1998   70.0  0.7  7.0  20.0  0.2  2.0  120.0  1.2  12.0
9/1/1998   80.0  0.8  8.0  30.0  0.3  3.0  130.0  1.3  13.0
10/1/1998  90.0  0.9  9.0  40.0  0.4  4.0  140.0  1.4  14.0

N.B. If you want the "City" label on the top-left side to be removed, just set df_out.columns.names directly:

df_out.columns.names=[None, None]
Sign up to request clarification or add additional context in comments.

3 Comments

nice answer, personally I would do df.groupby(['Date','City']).sum().unstack(1).reorder_levels([1, 0], axis=1).sort_index(axis=1) saves having to list all the value columns manually.
@Manakin: If there were many or variable value columns, you can list them using [el for el in df.columns if el not in ("City", "Date")] to avoid lengthy or data-dependent enumeration. I think this would save the cost of groupby-sum, which is logically unrelated to the true intention (pivoting the table).
Agreed, that approach is much better.
0
    import pandas as pd

    # create an example dataframe
    df = pd.DataFrame(
       {'date':[1990, 2000, 2010, 2020, 1990, 2000, 2010, 2020],
       'val1': [0,1,2,3, 10,11,12,13], 
       'val2':[5,6,7,8, 50,60,70,80],
       'city':['NYC', 'NYC',  'NYC', 'NYC', 'LA', 'LA','LA', 'LA']})
    # make a pivot table with multi-index
    df2  = df.pivot(index='date', columns='city')
    # reorder the multiindex as your desired output
    df2.columns = df2.columns.swaplevel(0, 1)
    df2.sort_index(axis=1, level=0, inplace=True)
    # print the dataframe
    df2

Output:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.