I have a dataframe in which multiple columns contain comma-separated string values. I want to convert this into a list with a comma-separated string. I have a way to achieve this, but I am looking for a better way.
df = pd.DataFrame({"A": ["test1, test2, test3, test4", "check1, check2, check3, check4", "test1, test2, test3, check4", "test1, test2, test3, check5"], "B": ["a,b,c,d", "e,f,g,h", "i,j,k,l", "m,n,o,p"], "C": ["mtest, mtest1, mtest2, mtest3", "c,d,e,f", "g,h,i,j", "k,l,m,n"]})
>>> df
A B C
0 test1, test2, test3, test4 a,b,c,d mtest, mtest1, mtest2, mtest3
1 check1, check2, check3, check4 e,f,g,h c,d,e,f
2 test1, test2, test3, check4 i,j,k,l g,h,i,j
3 test1, test2, test3, check5 m,n,o,p k,l,m,n
The output that I want is
>>> df
A B C
0 [test1, test2, test3, test4] [a, b, c, d] [mtest, mtest1, mtest2, mtest3]
1 [check1, check2, check3, check4] [e, f, g, h] [c, d, e, f]
2 [test1, test2, test3, check4] [i, j, k, l] [g, h, i, j]
3 [test1, test2, test3, check5] [m, n, o, p] [k, l, m, n]
My present method of achieving this is:-
>>> df["A"] = df["A"].str.split(',')
>>> df["B"] = df["B"].str.split(',')
>>> df["C"] = df["C"].str.split(',')
I want some operation on dataframe that can do this in 1 line instead of me going and apply str.split on every column(Since if there are more than 10 columns, I have to write this statement str split for all column). Lambda can be used to achieve this but it might be a slower operation. Is there a better way?
df.stack().str.split(',').unstack()and see how it goesstack+unstackmight be slower than usingapply.