3

I want to create a new column in my dataframe that places the name of the column in the row if only that column has a value of 8 in the respective row, otherwise the new column's value for the row would be "NONE". For the dataframe df, the new column df["New_Column"] = ["NONE","NONE","A","NONE"]

df = pd.DataFrame({"A": [1, 2,8,3], "B": [0, 2,4,8], "C": [0, 0,7,8]})
3
  • What is the expected outcome when two or more columns have a value of 8 in the same row? Both column names (ie "BC")? Only the first/last? Commented Nov 5, 2018 at 2:15
  • @DYZ are you implying one should replace the value 8 with the name of the column and the rest of rows with None? If that's the case, I got a bit confused by creating a new column instead of, say, modifying a column. Commented Nov 5, 2018 at 2:18
  • @TomasFarias I think the description of the problem is pretty clear (at least for a new contributor). A new column shall be created. Commented Nov 5, 2018 at 2:20

3 Answers 3

3

Cool problem.

  1. Find the 8-fields in each row: df==8
  2. Count them: (df==8).sum(axis=1)
  3. Find the rows where the count is 1: (df==8).sum(axis=1)==1
  4. Select just those rows from the original dataframe: df[(df==8).sum(axis=1)==1]==8
  5. Find the 8-fields again: df[(df==8).sum(axis=1)==1]==8)
  6. Find the columns that hold the True values with idxmax (because True>False): (df[(df==8).sum(axis=1)==1]==8).idxmax(axis=1)
  7. Fill in the gaps with "NONE"

To summarize:

df["New_Column"] = (df[(df==8).sum(axis=1)==1]==8).idxmax(axis=1)
df["New_Column"] = df["New_Column"].fillna("NONE")
#   A  B  C New_Column
#0  1  0  0       NONE
#1  2  2  0       NONE
#2  8  4  7          A
#3  3  8  8       NONE
# I added another line as a proof of concept
#4  0  8  0          B
Sign up to request clarification or add additional context in comments.

Comments

1

You can accomplish this using idxmax and a mask:

out = (df==8).idxmax(1)
m = ~(df==8).any(1) | ((df==8).sum(1) > 1)

df.assign(col=out.mask(m))

   A  B  C  col
0  1  0  0  NaN
1  2  2  0  NaN
2  8  4  7    A
3  3  8  8  NaN

Comments

1

Or do:

df2=df[(df==8)]
df['New_Column']=(df2[(df2!=df2.dropna(thresh=2).values[0]).all(1)].dropna(how='all')).idxmax(1)
df['New_Column'] = df['New_Column'].fillna('NONE')
print(df)

dropna + dropna again + idxmax + fillna. that's all you need for this.

Output:

   A  B  C New_Column
0  1  0  0       NONE
1  2  2  0       NONE
2  8  4  7          A
3  3  8  8       NONE

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.