Creating Dataframe from csv with pandas

Question

I have a csv file with the following layout: year, race, sex, age and population. Each year has several different groups.

I created the following Dataframe from the CSV

CSV_df = pd.read_csv('Data/Demographics/Demo/akwbo19ages.csv') 

df = CSV_df[CSV_df["age"] >= 4].groupby(["year","race","sex","age"])['pop'].sum()

which results in

year  race  sex  age
1969  1     1    1      10574
                 2      20245
                 ...
                 n      11715
            2    1       8924
                 2       9919
                 ...
                 n       9960
                        ...  
2012  3     1    1       7861
                 2       8242
                 ...
                 n       7268
            2    1       7245
                 2       7821
                 ...
                 n       6912

However, what I would like to have is for each row to represent a single year and have several columns representing each group (i.e. columns with population figures for each possible combination of race, sex and age group)

year  group1  group2 ... groupN
1969  10574   20245      9960
...
2012  7861    8242       6912

jezrael · Accepted Answer · 2016-03-31 11:12:24Z

2

IIUC you need unstack with reset_index, then by list comprehension rename columns names:

print s
year  race  sex  age
1969  1     1    1      10574
                 2      20245
            2    1       8924
                 2       9919
2012  3     1    1       7861
                 2       8242
            2    1       7245
                 2       7821
Name: a, dtype: int64


df = s.unstack().reset_index(drop=True, level=[1,2]).rename_axis(None)
df.columns = ['group' + str(col) for col in df.columns]
print df
      group1  group2
1969   10574   20245
1969    8924    9919
2012    7861    8242
2012    7245    7821

Or if you need index name remove rename_axis:

df = s.unstack().reset_index(drop=True, level=[1,2])
df.columns = ['group' + str(col) for col in df.columns]
print df
      group1  group2
year                
1969   10574   20245
1969    8924    9919
2012    7861    8242
2012    7245    7821

edited Mar 31, 2016 at 11:12

answered Mar 31, 2016 at 11:06

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Creating Dataframe from csv with pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related