2

Input DF:

Index  Parameters   A      B       C   
1      Apple        1      2       3   
2      Banana       2      4       5   
3      Potato       3      5       2   
4      Tomato       1 x 4  1 x 6   2 x 12

Output DF

Index  Parameters   A      B       C   
1      Apple        1      2       3   
2      Banana       2      4       5   
3      Potato       3      5       2   
4      Tomato_P     1      1       2
5      Tomato_Q     4      6      12

Problem Statement:

I want convert a row of data into multiple rows based on particular column value (Tomato) and with split parameter as x

Code/Findings:

I have a code which works well if I transpose this data set and then apply the answer from here or here and then re-transpose the same.

Looking for a solution which can directly work on the given dataframe

2 Answers 2

1

Solution if always only one x values in data - first Series.str.split by columns in list, then Series.explode, added all another columns by DataFrame.join and set _P with _Q with Series.duplicated and numpy.select:

cols = ['A','B','C']
df[cols] = df[cols].apply(lambda x : x.str.split(' x '))

df1 = pd.concat([df[x].explode() for x in cols],axis=1)
#print (df1)

df = df[df.columns.difference(cols)].join(df1)
df['Parameters'] += np.select([df.index.duplicated(keep='last'), 
                               df.index.duplicated()], 
                               ['_P','_Q'], 
                               default='')

df = df.reset_index(drop=True)
print (df)
  Parameters  A  B   C
0      Apple  1  2   3
1     Banana  2  4   5
2     Potato  3  5   2
3   Tomato_P  1  1   2
4   Tomato_Q  4  6  12

EDIT:

Answer with no explode:

cols = df.columns[1:]

df1 = (pd.concat([df[x].str.split(' x ', expand=True).stack() for x in cols],axis=1, keys=cols)
         .reset_index(level=1, drop=True))
print (df1)
       A  B   C
Index          
1      1  2   3
2      2  4   5
3      3  5   2
4      1  1   2
4      4  6  12

df = df.iloc[:, [0]].join(df1)
df['Parameters'] += np.select([df.index.duplicated(keep='last'), 
                               df.index.duplicated()], 
                               ['_P','_Q'], 
                               default='')

df = df.reset_index(drop=True)
print (df)

  Parameters  A  B   C
0      Apple  1  2   3
1     Banana  2  4   5
2     Potato  3  5   2
3   Tomato_P  1  1   2
4   Tomato_Q  4  6   1
Sign up to request clarification or add additional context in comments.

6 Comments

df1 = pd.concat([df[x].explode() for x in cols],axis=1) at this line..i am getting error ValueError: cannot reindex from a duplicate axis
@RahulAgarwal - Can you use df = df.reset_index(drop=True) before my solution?
done same error!! I have made one change to solution...instead of using cols, i have done cols = df.columns[1:].tolist()
@RahulAgarwal - Not possible duplicated columns names?
I randf = df.loc[:,~df.columns.duplicated()] command also to remove if any duplicated columns, but didn't get any and still getting same error!!
|
0

This is more like a explode problem , available after pandas 0.25

df[['A','B','C']]=df[['A','B','C']].apply(lambda x : x.str.split(' x '))
df
   Index Parameters       A       B        C
0      1      Apple     [1]     [2]      [3]
1      2     Banana     [2]     [4]      [5]
2      3     Potato     [3]     [5]      [2]
3      4     Tomato  [1, 4]  [1, 6]  [2, 12]
df.set_index(['Index','Parameters'],inplace=True)
pd.concat([df[x].explode() for x in ['A','B','C']],axis=1)
                  A  B   C
Index Parameters          
1     Apple       1  2   3
2     Banana      2  4   5
3     Potato      3  5   2
4     Tomato      1  1   2
      Tomato      4  6  12

1 Comment

Thanks for the answer!! Can we not use column names as I have too many column names!! Also, how do i rename column two Tomato..to Tomato_P and Tomato_Q

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.