1

I have searched all around the internet and tried many methods before making this post, I have a dataframe where I want to:

  • Replace NaN value of TGT_COLUMN_SCALE to 0 If TGT_COLUMN_DATA_TYPE is equals to NUMERIC.

my dataframe

Kindly help me out with this issue.

I tried this code but it's not working:

df["TGT_COLUMN_SCALE"] = np.where(df["TGT_COLUMN_DATA_TYPE"] == "NUMERIC", 'NaN', 0)
1
  • df.loc[(df.TARGET_COLUMN_DATA_TYPE == "NUMERIC") & (df.TARGET_COLUMN_SCALE.isnull()), "TARGET_COLUMN_SCALE"] = 0 Commented Jul 28, 2022 at 9:12

3 Answers 3

2

Sample:

df = pd.DataFrame({
    "TGT_COLUMN_DATA_TYPE" : ["DATE", "NUMERIC", "STRING", "NUMERIC"],
    "TGT_COLUMN_SCALE" : [np.NaN, np.NaN, 4.0, 5.0]
})

Replace

df.loc[(df.TGT_COLUMN_DATA_TYPE == "NUMERIC") & (df.TGT_COLUMN_SCALE.isnull()), "TGT_COLUMN_SCALE"] = 0

Result:

    TGT_COLUMN_DATA_TYPE    TGT_COLUMN_SCALE
0   DATE    NaN
1   NUMERIC 0.0
2   STRING  4.0
3   NUMERIC 5.0
Sign up to request clarification or add additional context in comments.

9 Comments

It replaced NaN with zero and all other values with NaN. That's not what I need.
It didnt.. it replaces only when TARGET_COLUMN_DATA_TYPE is Numerical and TARGET_COLUMN_DATA_TYPE is nan.
Look at index 3, TARGET_COLUMN_DATA_TYPE is set to NUMERIC and value is 5.0. it has not changed. only index 1 has changed.
It created new column and then did this - i.imgur.com/zAqOcVV.png that's not what I need.
Updated the column names in the answer as well
|
0

You just need to use loc to select the columns and then you use fillna to replace values:

df.loc[df.TGT_COLUMN_SCALE == "NUMERIC",
       "TGT_COLUMN_DATA_TYPE"] = df.loc[df.TGT_COLUMN_SCALE == "NUMERIC", "TGT_COLUMN_DATA_TYPE"].fillna(0)

Full code

TGT_COLUMN_SCALE = ('DATE', 'TIMESTAMP', 'NUMERIC', 'NUMERIC')
TGT_COLUMN_DATA_TYPE = (np.nan, np.nan, np.nan, np.nan)
df = pd.DataFrame(list(zip(TGT_COLUMN_SCALE, TGT_COLUMN_DATA_TYPE)),
                  columns=['TGT_COLUMN_SCALE', 'TGT_COLUMN_DATA_TYPE'])
df.loc[df.TGT_COLUMN_SCALE == "NUMERIC",
       "TGT_COLUMN_DATA_TYPE"] = df.loc[df.TGT_COLUMN_SCALE == "NUMERIC", "TGT_COLUMN_DATA_TYPE"].fillna(0)

4 Comments

Why are you including Date, Timestamp in this? I just want to change Numeric value of NaN with Zero. Can you provide any simpler method?
You just have to copy the last line, as you can see by copying the full code, it works.
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
it's just a warning, you should get rid of it with: pd.options.mode.chained_assignment = None
0

np.where will take the first option as the value in case the condition is true, else the second. You need to replace the order of nan and 0

df["TGT_COLUMN_SCALE"] = np.where((df["TGT_COLUMN_DATA_TYPE"] == "NUMERIC") & (df["TGT_COLUMN_SCALE"].isnull()), 0, df["TGT_COLUMN_SCALE"])

6 Comments

It's not working now. It's still showing NaN.
@AviThour you need to assign the returned value, updated the answer.
it replaced all other values which were not NaN with 0 aswell.
this is what your code did. i.imgur.com/IaBpjZI.png It replaced non-nan value to zero also.
@AviThour but this is what you asked Replace NaN value of TGT_COLUMN_SCALE to 0 If TGT_COLUMN_DATA_TYPE is equals to NUMERIC
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.