Dataframe conversion using python

Question

we have the following dataframe .

import pandas as pd
import numpy as np
a1 =["school.bgs.id","school.bgs.title","school.bgs.city","school.bgs.bgs1.id","school.sggs.id","school.sggs.title","school.sggs.city","school.sggs.srt.title","school.sggs.state"]
a2=[55,"BGS","pune",34,np.nan,np.nan,np.nan,np.nan,np.nan]
a3=[np.nan,np.nan,np.nan,np.nan,230,"SGGS","Nanded","SRT","maharashtra"]
df =pd.DataFrame(list(zip(a1,a2,a3)),columns=['data',0,1])

and expected output:

Kindly suggest better solution for the same

constantstranger · Accepted Answer · 2022-09-20 14:22:14Z

This will do what you've asked:

df['new_row'] = df.data.str.split('.').str[:-1].str.join('.')
df['new_col'] = df.data.str.split('.').str[-1]
df['new_val'] = df[0].where(df[0].notna(), df[1])
df = df.pivot('new_row','new_col','new_val')[['id','title','city','state']].rename_axis(None, axis='columns').rename_axis(None, axis='index')

Output:

                  id title    city        state
school.bgs        55   BGS    pune          NaN
school.bgs.bgs1   34   NaN     NaN          NaN
school.sggs      230  SGGS  Nanded  maharashtra
school.sggs.srt  NaN   SRT     NaN          NaN

Explanation:

take the final dot-separated token within the data column as new_col and the corresponding prefix from data as new_row
take the value of column 0 if non-null, else that of column 1, as new_val
use pivot() to create the desired output, with columns reordered using [['id','title','city','state']] and axis names removed using rename_axis().

SergFSM · Accepted Answer · 2022-09-20 18:33:32Z

2

using pivot looks quite suitable, so this solution is very similar with this:

df[['data','col']] = df['data'].str.rsplit('.',1,expand=True)
df = df.assign(val=df[0].combine_first(df[1])).pivot('data','col','val')

and the result is:

col                city   id        state title
data                                           
school.bgs         pune   55          NaN   BGS
school.bgs.bgs1     NaN   34          NaN   NaN
school.sggs      Nanded  230  maharashtra  SGGS
school.sggs.srt     NaN  NaN          NaN   SRT

answered Sep 20, 2022 at 18:33

SergFSM

1,4991 gold badge6 silver badges9 bronze badges

Comments

Anoushiravan R · Accepted Answer · 2022-09-20 20:59:36Z

1

Another rather similar way is as the following:

import pandas as pd

df[['data', 'cols']] = df['data'].str.extract(r'(?P<data>[\w.]+)\.(?P<cols>\w+$)')

df = (pd.pivot(df, index = 'data', columns='cols', values=[0, 1])
      .stack(level=0)
      .droplevel(1))

cols               city   id        state title
data                                           
school.bgs         pune   55          NaN   BGS
school.bgs.bgs1     NaN   34          NaN   NaN
school.sggs      Nanded  230  maharashtra  SGGS
school.sggs.srt     NaN  NaN          NaN   SRT

edited Sep 20, 2022 at 20:59

answered Sep 20, 2022 at 19:15

Anoushiravan R

22k3 gold badges22 silver badges44 bronze badges

Collectives™ on Stack Overflow

Dataframe conversion using python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related