I am looking to perform the following operation on a DataFrame efficiently. There DataFrame has a special column, containing strings, where some rows have a formatting problem. Naemly, in my case it has a + sign seperating what should be entries of two separate columns.
In particular, consider:
import pandas as pd
pd.DataFrame([ ['a', 0, 1 ], ['b+c', 2, 3 ],
['d+e', 4, 5 ], ['f', 6, 7 ] ])
which prints:
0 1 2
0 a 0 1
1 b+c 2 3
2 d+e 4 5
3 f 6 7
I want to transform this into:
0 1 2
0 a 0 1
1 b 2 3
2 c 2 3
3 d 4 5
4 e 4 5
5 f 6 7
That is, to "spread out" rows where there is the + sign, duplicating the other columns. This can be done by looping over rows and assigning to a new dataframe using regex, but I am looking for a simpler and more efficient way.
Edit: Optimally, the function would allow for multiple separators (+ signs). That is, transforming also
import pandas as pd
pd.DataFrame([ ['a', 0, 1 ], ['b+c', 2, 3 ],
['d+e+f', 4, 5 ], ['g', 6, 7 ] ])
into
0 1 2
0 a 0 1
1 b 2 3
2 c 2 3
3 d 4 5
4 e 4 5
5 f 4 5
6 g 6 7

