Creating a dataframe structure

Question

I have a dataframe like this:

id|c1|c2|c3|c4...
0|s:1,g:B,r:2|s:2,g:A,r:3|s:1,g:C,r:4|s:3,g:D,r:2.....
1|NaN|s:2;g:E,r:4|s:3;g:C,r:3|s:3;g:F,r:3.....

I want to rearrange the dataframe like this:

id|c|s|g|r
0|c1|1|B|2
0|c2|2|A|3
0|c3|1|C|4
0|c4|3|D|2
1|c1|NaN|NaN|NaN
1|c2|2|E|4
1|c3|3|C|3
1|c4|3|F|3

I have tried the following:

df.melt()

if possible share code snippet

The Guy
– The Guy

2020-01-17 12:11:36 +00:00
Commented Jan 17, 2020 at 12:11 — The Guy
– The Guy, Commented Jan 17, 2020 at 12:11

jezrael · Accepted Answer · 2020-01-17 13:45:08Z

5

Idea is reshape by DataFrame.set_index withDataFrame.stack and replace missing values by empty columns names s,g,r, then Series.str.split by ; or ,, again reshape, then split by : and last reshape by Series.unstack:

df1 = (df.set_index('id')
         .fillna('s,g,r')
         .stack()
         .str.split(',|;', expand=True)
         .stack()
         .str.split(':', expand=True)
         .reset_index(level=2, drop=True)
         .set_index(0, append=True)[1]
         .unstack()
         .rename_axis(('id','c'))
         .rename_axis(None, axis=1)
         .reset_index()
         )
print (df1)
   id   c     g     r     s
0   0  c1     B     2     1
1   0  c2     A     3     2
2   0  c3     C     4     1
3   0  c4     D     2     3
4   1  c1  None  None  None
5   1  c2     E     4     2
6   1  c3     C     3     3
7   1  c4     F     3     3

EDIT: First step is reshape by stack with index id:

print (df.set_index('id')
         .fillna('s,g,r')
         .stack())
id    
0   c1    s:1,g:B,r:2
    c2    s:2,g:A,r:3
    c3    s:1,g:C,r:4
    c4    s:3,g:D,r:2
1   c1          s,g,r
    c2    s:2;g:E,r:4
    c3    s:3;g:C,r:3
    c4    s:3;g:F,r:3
dtype: object

Next step is spit by separator and again reshape by stack:

print (df.set_index('id')
         .fillna('s,g,r')
         .stack()
         .str.split(',|;', expand=True)
         .stack())
id       
0   c1  0    s:1
        1    g:B
        2    r:2
    c2  0    s:2
        1    g:A
        2    r:3
    c3  0    s:1
        1    g:C
        2    r:4
    c4  0    s:3
        1    g:D
        2    r:2
1   c1  0      s
        1      g
        2      r
    c2  0    s:2
        1    g:E
        2    r:4
    c3  0    s:3
        1    g:C
        2    r:3
    c4  0    s:3
        1    g:F
        2    r:3
dtype: object

Then split by : to 2 columns and convert first column to last level of MultiIndex:

print (df.set_index('id')
         .fillna('s,g,r')
         .stack()
         .str.split(',|;', expand=True)
         .stack()
         .str.split(':', expand=True)
         .reset_index(level=2, drop=True)
         .set_index(0, append=True)[1])
id      0
0   c1  s       1
        g       B
        r       2
    c2  s       2
        g       A
        r       3
    c3  s       1
        g       C
        r       4
    c4  s       3
        g       D
        r       2
1   c1  s    None
        g    None
        r    None
    c2  s       2
        g       E
        r       4
    c3  s       3
        g       C
        r       3
    c4  s       3
        g       F
        r       3

Last reshape by unstack:

print (df.set_index('id')
         .fillna('s,g,r')
         .stack()
         .str.split(',|;', expand=True)
         .stack()
         .str.split(':', expand=True)
         .reset_index(level=2, drop=True)
         .set_index(0, append=True)[1]
         .unstack())
0         g     r     s
id                     
0  c1     B     2     1
   c2     A     3     2
   c3     C     4     1
   c4     D     2     3
1  c1  None  None  None
   c2     E     4     2
   c3     C     3     3
   c4     F     3     3

edited Jan 17, 2020 at 13:45

answered Jan 17, 2020 at 12:15

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sri Test Over a year ago

It worked but I am getting both nan and None.What does both of them mean?for example,array(['2', '1', None, '3', '4', 'Other', nan], dtype=object).Instead I want all as nan

jezrael Over a year ago

@SriTest - I think no problem if None and NaN, pandas working with it same way. But if wat replace None to NaNs use df = df.mask(df.isna(), np.nan) after my solution

Sri Test Over a year ago

Thats a new one for me. Great!It works fine.But would you try to explain me what approach you have taken like in steps.That would be really helpful

anky · Accepted Answer · 2020-01-17 12:40:39Z

2

Using explode and stack with series.str.split

df = df.set_index('id')
(df.stack(dropna=False).str.split(',|;').explode().str.split(':',expand=True)
.set_index(0,append=True)[1].unstack().dropna(how='all',axis=1)
.rename_axis(['id','C']).reset_index())

0  id   C    g    r    s
0   0  c1    B    2    1
1   0  c2    A    3    2
2   0  c3    C    4    1
3   0  c4    D    2    3
4   1  c1  NaN  NaN  NaN
5   1  c2    E    4    2
6   1  c3    C    3    3
7   1  c4    F    3    3

edited Jan 17, 2020 at 12:40

answered Jan 17, 2020 at 12:19

anky

75.3k11 gold badges46 silver badges76 bronze badges

2 Comments

Sri Test Over a year ago

AttributeError: 'Series' object has no attribute 'explode' . I get this exception

anky Over a year ago

@SriTest explode is available from pandas version 0.25 and above. you have to upgrade the pandas version for explode to be available

BENY · Accepted Answer · 2020-01-17 12:38:19Z

1

I will suggested

s=df.melt('id')
s.loc[s.value.notna(),'value']=[dict(item.split(":") for item in x.replace(';',',').split(",")) for x in s.value.dropna()]
s=s.join(pd.DataFrame(s.value.dropna().tolist(),index=s.dropna().index))

answered Jan 17, 2020 at 12:38

BENY

324k22 gold badges176 silver badges250 bronze badges

1 Comment

Sri Test Over a year ago

Throws an exception like AttributeError: 'int' object has no attribute 'replace'

Collectives™ on Stack Overflow

Creating a dataframe structure

3 Answers 3

3 Comments

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related