I have the following problem: I want to append columns to a dataframe. These columns are the unique values in another row of this dataframe, filled with the occurence of this value in this row. It looks like this:
df:
Column1 Column2
0 1 a,b,c
1 2 a,e
2 3 a
3 4 c,f
4 5 c,f
What I am trying to get is:
Column1 Column2 a b c e f
0 1 a,b,c 1 1 1
1 2 a,e 1 1
2 3 a 1
3 4 c,f 1 1
4 5 c,f 1 1
(the empty spaces can be nan or 0, it matters not.)
I have now written some code to aceive this, but instead of appending columns, it appends rows, so that my output looks like this:
Column1 Column2
0 1 a,b,c
1 2 a,e
2 3 a
3 4 c,f
4 5 c,f
a 1 1
b 1 1
c 1 1
e 1 1
f 1 1
The code looks like this:
def NewCols(x):
for i, value in df['Column2'].iteritems():
listi=value.split(',')
for value in listi:
string = value
x[string]=list.count(string)
return x
df1=df.apply(NewCols)
What I am trying to do here is to iterate through each row of the dataframe and split the string (a,b,c) contained in Column2 at comma, so the variable listi
is then a list containing the separated string values. For each of this values I then want to make a new column and fill it with the number of occurences of that value in listi. I am confused why the code appends rows instead of columns. Does somebody know why and how I can correct that?