4

I want to change the value of a particular column if part of another column is found,

for example, I have the following data frame :

**DATE**    **TIME**    **VALUE**
20060103    02:01:00    54
20060103    03:02:00    12
20060103    05:03:00    21
20060103    08:05:00    54
20060103    06:06:00    87
20060103    02:07:00    79
20060103    02:08:00    46

I want to change the value in the VALUE column to VALUE of 30, only if the hourly value of the TIME column is equal to 02.

So the desired Data frame would be :

**DATE**    **TIME**    **VALUE**
20060103    02:01:00    30
20060103    03:02:00    12
20060103    05:03:00    21
20060103    08:05:00    54
20060103    06:06:00    87
20060103    02:07:00    30
20060103    02:08:00    30

Notice how in rows 1 6 and 7 the VALUE changed to 30, because the hour value in the TIME column starts at 02.

I tried to do it the simple way and go over each row and set the value:

import pandas as pd

df = pd.read_csv('file.csv')

for a in df['TIME']:
    if a[:2] == '02':
        df["VALUE"] = 30

df.to_csv("file.csv", index=False)

But unfortunately this is a file with tens of millions of lines, and this method will take me forever. I would appreciate if anyone has a more creative and effective method .

Thanks !

4 Answers 4

3

Try loc assignment:

df.loc[pd.to_datetime(df['Time']).dt.hour == 2, 'Value'] = 30

Or:

df.loc[df['Time'].str[:2] == '02', 'Value'] = 30
Sign up to request clarification or add additional context in comments.

Comments

1

You can try apply method to iterate through each rows

df['VALUE'] = df1.apply(lambda x: 30 if x['TIME'][:2]=='02' else x['VALUE'], axis='columns')

Comments

1
import io

data = '''DATE    TIME    VALUE
20060103    02:01:00    54
20060103    03:02:00    12
20060103    05:03:00    21
20060103    08:05:00    54
20060103    06:06:00    87
20060103    02:07:00    79
20060103    02:08:00    46'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')

df.loc[df['TIME'].str[:2]=='02', 'VALUE'] =30

Comments

1

You can achieve this using np.where() which should be a bit faster.

import numpy as np 

In [67]: df['VALUE'] = np.where(df['TIME'].str[:2]=='02', 30, df['VALUE'])

In [68]: df
Out[68]: 
       DATE      TIME  VALUE
0  20060103  02:01:00     30
1  20060103  03:02:00     12
2  20060103  05:03:00     21
3  20060103  08:05:00     54
4  20060103  06:06:00     87
5  20060103  02:07:00     30
6  20060103  02:08:00     30

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.