1

I've asked this question before but the answer I got didn't quite work out as I thought it had, so that here I am.

Previous question: Defining a function for changing column values and creating new datasets

I am trying to define a function where it will take a dataframe and change values in a column to create multiple new dataframes.

As an example, from df1 looking like:

  df1:

  class    colB    colC
0   1      1b      1c
1   2      2b      2c
2   3      3b      3c
3   1      4b      4c
4   2      5b      5c

I am trying to create multiple binary classes to implement one-vs-all classification. So the function would create...

df2:
  class    colB    colC
0   1      1b      1c
1   -1      2b      2c
2   -1      3b      3c
3   1      4b      4c
4   -1      5b      5c

df3:
  class    colB    colC
0   -1      1b      1c
1   1      2b      2c
2   -1      3b      3c
3   -1      4b      4c
4   1      5b      5c

df4:
  class    colB    colC
0   -1      1b      1c
1   -1      2b      2c
2    1      3b      3c
3   -1      4b      4c
4   -1      5b      5c

and so on. All the unique values are an incremental value ranging from 1 to 120.

The problem with the previous answer give (np.identity) was that it created dataframes taking every single value as either 1 or -1 instead of categorizing identical values as the same class accordingly.

Thanks

1
  • 1
    Mind double checking your input for df4? I think only the 2nd row should be 1 Commented Aug 19, 2018 at 3:51

2 Answers 2

2

A similar idea using np.where and unique (again renaming your class column so it doesn't override a builtin name):

dfs = [
    df1.assign(class_=np.where(df1['class_'].eq(i), 1, -1)) for i in df1['class_'].unique()
]

for d in dfs:
    print(d, end='\n\n')

   class_ colB colC
0       1   1b   1c
1      -1   2b   2c
2      -1   3b   3c
3       1   4b   4c
4      -1   5b   5c

   class_ colB colC
0      -1   1b   1c
1       1   2b   2c
2      -1   3b   3c
3      -1   4b   4c
4       1   5b   5c

   class_ colB colC
0      -1   1b   1c
1      -1   2b   2c
2       1   3b   3c
3      -1   4b   4c
4      -1   5b   5c
Sign up to request clarification or add additional context in comments.

1 Comment

You are great. Thanks heaps :D
1

In similar vein to @user3483203, but using mask and fillna:

[df.assign(**{'class' : df['class'].mask(df['class'].ne(cls)).fillna(-1)}) 
     for cls in df['class'].unique()
]

[   class colB colC
 0    1.0   1b   1c
 1   -1.0   2b   2c
 2   -1.0   3b   3c
 3    1.0   4b   4c
 4   -1.0   5b   5c,    class colB colC
 0   -1.0   1b   1c
 1    2.0   2b   2c
 2   -1.0   3b   3c
 3   -1.0   4b   4c
 4    2.0   5b   5c,    class colB colC
 0   -1.0   1b   1c
 1   -1.0   2b   2c
 2    3.0   3b   3c
 3   -1.0   4b   4c
 4   -1.0   5b   5c]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.