Change Column value based on part of another column using pandas

Question

I want to change the value of a particular column if part of another column is found,

for example, I have the following data frame :

**DATE**    **TIME**    **VALUE**
20060103    02:01:00    54
20060103    03:02:00    12
20060103    05:03:00    21
20060103    08:05:00    54
20060103    06:06:00    87
20060103    02:07:00    79
20060103    02:08:00    46

I want to change the value in the VALUE column to VALUE of 30, only if the hourly value of the TIME column is equal to 02.

So the desired Data frame would be :

**DATE**    **TIME**    **VALUE**
20060103    02:01:00    30
20060103    03:02:00    12
20060103    05:03:00    21
20060103    08:05:00    54
20060103    06:06:00    87
20060103    02:07:00    30
20060103    02:08:00    30

Notice how in rows 1 6 and 7 the VALUE changed to 30, because the hour value in the TIME column starts at 02.

I tried to do it the simple way and go over each row and set the value:

import pandas as pd

df = pd.read_csv('file.csv')

for a in df['TIME']:
    if a[:2] == '02':
        df["VALUE"] = 30

df.to_csv("file.csv", index=False)

But unfortunately this is a file with tens of millions of lines, and this method will take me forever. I would appreciate if anyone has a more creative and effective method .

Thanks !

U13-Forward · Accepted Answer · 2021-10-03 02:12:37Z

3

Try loc assignment:

df.loc[pd.to_datetime(df['Time']).dt.hour == 2, 'Value'] = 30

Or:

df.loc[df['Time'].str[:2] == '02', 'Value'] = 30

answered Oct 3, 2021 at 2:12

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ade_1 · Accepted Answer · 2021-10-03 02:05:05Z

1

You can try apply method to iterate through each rows

df['VALUE'] = df1.apply(lambda x: 30 if x['TIME'][:2]=='02' else x['VALUE'], axis='columns')

answered Oct 3, 2021 at 2:05

Ade_1

1,5161 gold badge8 silver badges17 bronze badges

Comments

Jonathan Leon · Accepted Answer · 2021-10-03 02:12:34Z

1

import io

data = '''DATE    TIME    VALUE
20060103    02:01:00    54
20060103    03:02:00    12
20060103    05:03:00    21
20060103    08:05:00    54
20060103    06:06:00    87
20060103    02:07:00    79
20060103    02:08:00    46'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')

df.loc[df['TIME'].str[:2]=='02', 'VALUE'] =30

answered Oct 3, 2021 at 2:12

Jonathan Leon

5,6862 gold badges9 silver badges16 bronze badges

Comments

yudhiesh · Accepted Answer · 2021-10-03 03:44:24Z

1

You can achieve this using np.where() which should be a bit faster.

import numpy as np 

In [67]: df['VALUE'] = np.where(df['TIME'].str[:2]=='02', 30, df['VALUE'])

In [68]: df
Out[68]: 
       DATE      TIME  VALUE
0  20060103  02:01:00     30
1  20060103  03:02:00     12
2  20060103  05:03:00     21
3  20060103  08:05:00     54
4  20060103  06:06:00     87
5  20060103  02:07:00     30
6  20060103  02:08:00     30

answered Oct 3, 2021 at 3:44

yudhiesh

6,8774 gold badges25 silver badges56 bronze badges

Collectives™ on Stack Overflow

Change Column value based on part of another column using pandas

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related