1

we have the following dataframe .

enter image description here

import pandas as pd
import numpy as np
a1 =["school.bgs.id","school.bgs.title","school.bgs.city","school.bgs.bgs1.id","school.sggs.id","school.sggs.title","school.sggs.city","school.sggs.srt.title","school.sggs.state"]
a2=[55,"BGS","pune",34,np.nan,np.nan,np.nan,np.nan,np.nan]
a3=[np.nan,np.nan,np.nan,np.nan,230,"SGGS","Nanded","SRT","maharashtra"]
df =pd.DataFrame(list(zip(a1,a2,a3)),columns=['data',0,1])

and expected output:

enter image description here

Kindly suggest better solution for the same

3 Answers 3

4

This will do what you've asked:

df['new_row'] = df.data.str.split('.').str[:-1].str.join('.')
df['new_col'] = df.data.str.split('.').str[-1]
df['new_val'] = df[0].where(df[0].notna(), df[1])
df = df.pivot('new_row','new_col','new_val')[['id','title','city','state']].rename_axis(None, axis='columns').rename_axis(None, axis='index')

Output:

                  id title    city        state
school.bgs        55   BGS    pune          NaN
school.bgs.bgs1   34   NaN     NaN          NaN
school.sggs      230  SGGS  Nanded  maharashtra
school.sggs.srt  NaN   SRT     NaN          NaN

Explanation:

  • take the final dot-separated token within the data column as new_col and the corresponding prefix from data as new_row
  • take the value of column 0 if non-null, else that of column 1, as new_val
  • use pivot() to create the desired output, with columns reordered using [['id','title','city','state']] and axis names removed using rename_axis().
Sign up to request clarification or add additional context in comments.

Comments

2

using pivot looks quite suitable, so this solution is very similar with this:

df[['data','col']] = df['data'].str.rsplit('.',1,expand=True)
df = df.assign(val=df[0].combine_first(df[1])).pivot('data','col','val')

and the result is:

col                city   id        state title
data                                           
school.bgs         pune   55          NaN   BGS
school.bgs.bgs1     NaN   34          NaN   NaN
school.sggs      Nanded  230  maharashtra  SGGS
school.sggs.srt     NaN  NaN          NaN   SRT

Comments

1

Another rather similar way is as the following:

import pandas as pd

df[['data', 'cols']] = df['data'].str.extract(r'(?P<data>[\w.]+)\.(?P<cols>\w+$)')

df = (pd.pivot(df, index = 'data', columns='cols', values=[0, 1])
      .stack(level=0)
      .droplevel(1))

cols               city   id        state title
data                                           
school.bgs         pune   55          NaN   BGS
school.bgs.bgs1     NaN   34          NaN   NaN
school.sggs      Nanded  230  maharashtra  SGGS
school.sggs.srt     NaN  NaN          NaN   SRT

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.