I have a column in a DataFrame (which is a column in a csv) which are comma-separated values. I'd like to split this column into multiple columns.
The problem is an old one, and has been discussed here also, but there is one peculiarity: one entry may be from 0-n comma-separated values. An example:
df.head():
i: vals | sth_else
---------------------
1: a,b,c | ba
2: a,d | be
3: | bi
4: e,a,c | bo
5: e | bu
I'd like the following output (or similar, e.g. True/False):
i : a | b | c | d | e | sth_else
-----------------------------------
1: 1 | 1 | 1 | 0 | 0 | ba
2: 1 | 0 | 0 | 1 | 0 | be
3: 0 | 0 | 0 | 0 | 0 | bi
4: 1 | 0 | 1 | 0 | 1 | bo
5: 0 | 0 | 0 | 0 | 1 | bu
I'm currently experimenting with the Series.str.split and then Series.to_dict functions, but with out any satisfactory results (causing always a ValueError: arrays must all be same length. :)
Also, I always try to find elegant solutions which are easily understandable when looked at after a couple of months ;). In any case, propositions are highly appreciated!
Here is the dummy.csv for testing.
vals;sth_else
a,b,c;ba
a,d;be
;bi
e,a,c;bo
e;bu