2

I have a dataframe like this:

id|c1|c2|c3|c4...
0|s:1,g:B,r:2|s:2,g:A,r:3|s:1,g:C,r:4|s:3,g:D,r:2.....
1|NaN|s:2;g:E,r:4|s:3;g:C,r:3|s:3;g:F,r:3.....

I want to rearrange the dataframe like this:

id|c|s|g|r
0|c1|1|B|2
0|c2|2|A|3
0|c3|1|C|4
0|c4|3|D|2
1|c1|NaN|NaN|NaN
1|c2|2|E|4
1|c3|3|C|3
1|c4|3|F|3

I have tried the following:

df.melt()
1
  • if possible share code snippet Commented Jan 17, 2020 at 12:11

3 Answers 3

5

Idea is reshape by DataFrame.set_index withDataFrame.stack and replace missing values by empty columns names s,g,r, then Series.str.split by ; or ,, again reshape, then split by : and last reshape by Series.unstack:

df1 = (df.set_index('id')
         .fillna('s,g,r')
         .stack()
         .str.split(',|;', expand=True)
         .stack()
         .str.split(':', expand=True)
         .reset_index(level=2, drop=True)
         .set_index(0, append=True)[1]
         .unstack()
         .rename_axis(('id','c'))
         .rename_axis(None, axis=1)
         .reset_index()
         )
print (df1)
   id   c     g     r     s
0   0  c1     B     2     1
1   0  c2     A     3     2
2   0  c3     C     4     1
3   0  c4     D     2     3
4   1  c1  None  None  None
5   1  c2     E     4     2
6   1  c3     C     3     3
7   1  c4     F     3     3

EDIT: First step is reshape by stack with index id:

print (df.set_index('id')
         .fillna('s,g,r')
         .stack())
id    
0   c1    s:1,g:B,r:2
    c2    s:2,g:A,r:3
    c3    s:1,g:C,r:4
    c4    s:3,g:D,r:2
1   c1          s,g,r
    c2    s:2;g:E,r:4
    c3    s:3;g:C,r:3
    c4    s:3;g:F,r:3
dtype: object

Next step is spit by separator and again reshape by stack:

print (df.set_index('id')
         .fillna('s,g,r')
         .stack()
         .str.split(',|;', expand=True)
         .stack())
id       
0   c1  0    s:1
        1    g:B
        2    r:2
    c2  0    s:2
        1    g:A
        2    r:3
    c3  0    s:1
        1    g:C
        2    r:4
    c4  0    s:3
        1    g:D
        2    r:2
1   c1  0      s
        1      g
        2      r
    c2  0    s:2
        1    g:E
        2    r:4
    c3  0    s:3
        1    g:C
        2    r:3
    c4  0    s:3
        1    g:F
        2    r:3
dtype: object

Then split by : to 2 columns and convert first column to last level of MultiIndex:

print (df.set_index('id')
         .fillna('s,g,r')
         .stack()
         .str.split(',|;', expand=True)
         .stack()
         .str.split(':', expand=True)
         .reset_index(level=2, drop=True)
         .set_index(0, append=True)[1])
id      0
0   c1  s       1
        g       B
        r       2
    c2  s       2
        g       A
        r       3
    c3  s       1
        g       C
        r       4
    c4  s       3
        g       D
        r       2
1   c1  s    None
        g    None
        r    None
    c2  s       2
        g       E
        r       4
    c3  s       3
        g       C
        r       3
    c4  s       3
        g       F
        r       3

Last reshape by unstack:

print (df.set_index('id')
         .fillna('s,g,r')
         .stack()
         .str.split(',|;', expand=True)
         .stack()
         .str.split(':', expand=True)
         .reset_index(level=2, drop=True)
         .set_index(0, append=True)[1]
         .unstack())
0         g     r     s
id                     
0  c1     B     2     1
   c2     A     3     2
   c3     C     4     1
   c4     D     2     3
1  c1  None  None  None
   c2     E     4     2
   c3     C     3     3
   c4     F     3     3
Sign up to request clarification or add additional context in comments.

3 Comments

It worked but I am getting both nan and None.What does both of them mean?for example,array(['2', '1', None, '3', '4', 'Other', nan], dtype=object).Instead I want all as nan
@SriTest - I think no problem if None and NaN, pandas working with it same way. But if wat replace None to NaNs use df = df.mask(df.isna(), np.nan) after my solution
Thats a new one for me. Great!It works fine.But would you try to explain me what approach you have taken like in steps.That would be really helpful
2

Using explode and stack with series.str.split

df = df.set_index('id')
(df.stack(dropna=False).str.split(',|;').explode().str.split(':',expand=True)
.set_index(0,append=True)[1].unstack().dropna(how='all',axis=1)
.rename_axis(['id','C']).reset_index())

0  id   C    g    r    s
0   0  c1    B    2    1
1   0  c2    A    3    2
2   0  c3    C    4    1
3   0  c4    D    2    3
4   1  c1  NaN  NaN  NaN
5   1  c2    E    4    2
6   1  c3    C    3    3
7   1  c4    F    3    3

2 Comments

AttributeError: 'Series' object has no attribute 'explode' . I get this exception
@SriTest explode is available from pandas version 0.25 and above. you have to upgrade the pandas version for explode to be available
1

I will suggested

s=df.melt('id')
s.loc[s.value.notna(),'value']=[dict(item.split(":") for item in x.replace(';',',').split(",")) for x in s.value.dropna()]
s=s.join(pd.DataFrame(s.value.dropna().tolist(),index=s.dropna().index))

1 Comment

Throws an exception like AttributeError: 'int' object has no attribute 'replace'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.