2

I have got a dataFrame which looks like this:

index | in | out | time
   7  |  8 |  8  |  232
  11  |  3 |  0  |    0
  79  |  0 |  8  |   12

And I want to create a DataFrame out of this one, where every non-zero in/out value is set to 1 (they are all positive). Time and index should be the same:

index | in | out | time
   7  |  1 |  1  |  232
  11  |  1 |  0  |    0
  79  |  0 |  1  |   12

I think there should be a faster way, than how I am doing this:

df2 = pd.DataFrame({"index":[], "in":[], "out":[], "time":[]})
for index, row in df.iterrows():
    if row["in"] == 0:
        in_val = 0
    else:
        in_val = 1
    if row["out"] == 0: 
        out_val = 0
    else:
        out_val = 1
    time = row["time"]
    df2 = df2.append(pd.DataFrame({"index":[index], "in":[in_val], "out":[out_val], "time":[time]}), sort=False)

Can I use some lambda function or something like a list comprehension to convert the dataframe faster?

1
  • 1
    use np.where to change values other than 1 to 1 Commented Sep 10, 2019 at 6:27

5 Answers 5

4

Use numpy.where with columns with lists:

cols = ['in','out']
df[cols] = np.where(df[cols].eq(0), 0, 1)

Or cast boolean mask for not equal to integers:

df[cols] = df[cols].ne(0).astype(int)

If no negative values use DataFrame.clip:

df[cols] = df[cols].clip(upper=1)
print (df)
   index  in  out  time
0      7   1    1   232
1     11   1    0     0
2     79   0    1    12
Sign up to request clarification or add additional context in comments.

Comments

1

Alternatively you can use astype to convert to boolean and multiply with 1:

cols=['in','out']
df[cols]=df[cols].astype(bool)*1

   index  in  out  time
0      7   1    1   232
1     11   1    0     0
2     79   0    1    12

Comments

0

use np.where()

df=pd.DataFrame(data={"in":[8,3,0],
                  "out":[8,0,8],
                  "time":[232,0,12]})

df[['in','out']] = np.where(df[['in','out']] == 0, 0, 1)
   in   out time
0   1   1   232
1   1   0   0
2   0   1   12

Comments

0

So you have a dataframe like this,

    index   in  out     time
0   7   8   8   232
1   11  3   0   0
2   79  0   8   12

Use np.where to get the desired result like this,

df['in'] = np.where(df['in'] > 0, 1, 0)
df['out' = np.where(df['out'] > 0, 1, 0)

Comments

0

You can try

df['in'] = [1 if i>0 else 0 for i in list(df['in'])]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.