4

I have a dataframe like as shown below

f = pd.DataFrame({'person_id': [101,101,101,201,201,201,203],
                  'test_id':[123,123,124,321,321,321,456],
                 'los_24':[0.3,0.7,0.6,1.01,2,1,2],
                 'los_48':[1,0.2,0.4,0.7,11,2,3],
                 'in_24':[21,24,0.3,2.3,0.8,23,1.001],
                 'in_48':[11.3,202.0,0.2,0.3,41.0,47,2],
                 'test':['A','B','C','D','E','F','G']})

I would like to replace all values less than 1 with value 1 under columns like los_24,los_48,in_24,in_48

I tried the below

f['los_24'] = np.where((f.los_24 < 1.0),1,f.los_24)
f['los_48'] = np.where((f.los_48 < 1.0),1,f.los_48)
f['in_24'] = np.where((f.in_24 < 1.0),1,f.in_24)
f['in_48'] = np.where((f.in_48 < 1.0),1,f.in_48)

But you can see am writing the same line of code multiple times with different column names.

In real data, I have more than 10 columns to replace values. So, Is there any other efficient and elegant way to write this?

I expect my output to be like as shown below

enter image description here

3 Answers 3

9

You can clip:

cols = ["los_24", "los_48", "in_24", "in_48"]

f[cols] = f[cols].clip(lower=1)

to get

   person_id  test_id  los_24  los_48   in_24  in_48 test
0        101      123    1.00     1.0  21.000   11.3    A
1        101      123    1.00     1.0  24.000  202.0    B
2        101      124    1.00     1.0   1.000    1.0    C
3        201      321    1.01     1.0   2.300    1.0    D
4        201      321    2.00    11.0   1.000   41.0    E
5        201      321    1.00     2.0  23.000   47.0    F
6        203      456    2.00     3.0   1.001    2.0    G
Sign up to request clarification or add additional context in comments.

Comments

3

You can select all columns for processing in list and only once call function numpy.where with selected columns:

cols = ['los_24','los_48','in_24','in_48']

f[cols] = np.where((f[cols] < 1.0),1,f[cols])

Or with DataFrame.mask:

f[cols] = f[cols].mask((f[cols] < 1.0),1)

   person_id  test_id  los_24  los_48   in_24  in_48 test
0        101      123    1.00     1.0  21.000   11.3    A
1        101      123    1.00     1.0  24.000  202.0    B
2        101      124    1.00     1.0   1.000    1.0    C
3        201      321    1.01     1.0   2.300    1.0    D
4        201      321    2.00    11.0   1.000   41.0    E
5        201      321    1.00     2.0  23.000   47.0    F
6        203      456    2.00     3.0   1.001    2.0    G

Comments

1

Wow, there are so many ways to skin the cat.. You could also use the lambda function:

cols = ['los_24','los_48','in_24','in_48']
for col in cols:
    f[col] = f[col].apply(lambda x: 1 if x<1 else x)

Same output :-)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.