3

This question maybe super basic and apologize for that..

But I am trying to create a for loop that would enter a value of 1 or 0 into a pandas dataframe based on a condition.

import pandas as pd

def checkHour6(time):
    val = 0
    if  time == 6:
        val = 1 
    return val

def checkHour7(time):
    val = 0
    if  time == 7:
        val = 1 
    return val

def checkHour8(time):
    val = 0
    if  time == 8:
        val = 1 
    return val

def checkHour9(time):
    val = 0
    if  time == 9:
        val = 1 
    return val

def checkHour10(time):
    val = 0
    if  time == 10:
        val = 1 
    return val

This for loop that I am attempting will count from 0 to 23, and I am attempting to building pandas dataframe in the loop process that will enter a value of a 1 or 0 appropriately but I am missing something basic as the final df result is an empty dataframe.

Create empty df:

df = pd.DataFrame({'hour_6':[], 'hour_7':[], 'hour_8':[], 'hour_9':[], 'hour_10':[]})

For Loop:

hour = -1

for i in range(24):
    stuff = []
    hour = hour + 1
    stuff.append(checkHour6(hour))
    stuff.append(checkHour7(hour))
    stuff.append(checkHour8(hour))
    stuff.append(checkHour9(hour))
    stuff.append(checkHour10(hour))
    df.append(stuff)
3
  • try don't use loops with pandas, pandas has methods to do it Commented Mar 13, 2020 at 20:37
  • Why use 0/1 instead of proper boolean values? Commented Mar 14, 2020 at 2:35
  • I am attempting to create a dataframe to be used with a machine learning process. But maybe boolean values would work as well?? Commented Mar 16, 2020 at 15:13

5 Answers 5

1

I would suggest the following:

  • use only one checkHour() function with a parameter for hour,
  • according to pandas.DataFrame.append() documentation, other parameter has to be DataFrame or Series/dict-like object, or list of these, so list cannot be used,
  • if you want to make a data frame by appending new rows to the existing one, you have to assign it.

The code can look like this:

def checkHour(time, hour):
    val = 0
    if time == hour:
        val = 1 
    return val

df = pd.DataFrame({'hour_6':[], 'hour_7':[], 'hour_8':[], 'hour_9':[], 'hour_10':[]})

hour = -1

for i in range(24):
    stuff = {}
    hour = hour + 1
    stuff['hour_6'] = checkHour(hour, 6)
    stuff['hour_7'] = checkHour(hour, 7)
    stuff['hour_8'] = checkHour(hour, 8)
    stuff['hour_9'] = checkHour(hour, 9)
    stuff['hour_10'] = checkHour(hour, 10)
    df = df.append(stuff, ignore_index=True)

The result is following:

>>> print(df)
    hour_6  hour_7  hour_8  hour_9  hour_10
0      0.0     0.0     0.0     0.0      0.0
1      0.0     0.0     0.0     0.0      0.0
2      0.0     0.0     0.0     0.0      0.0
3      0.0     0.0     0.0     0.0      0.0
4      0.0     0.0     0.0     0.0      0.0
5      0.0     0.0     0.0     0.0      0.0
6      1.0     0.0     0.0     0.0      0.0
7      0.0     1.0     0.0     0.0      0.0
8      0.0     0.0     1.0     0.0      0.0
9      0.0     0.0     0.0     1.0      0.0
10     0.0     0.0     0.0     0.0      1.0
11     0.0     0.0     0.0     0.0      0.0
12     0.0     0.0     0.0     0.0      0.0
13     0.0     0.0     0.0     0.0      0.0
14     0.0     0.0     0.0     0.0      0.0
15     0.0     0.0     0.0     0.0      0.0
16     0.0     0.0     0.0     0.0      0.0
17     0.0     0.0     0.0     0.0      0.0
18     0.0     0.0     0.0     0.0      0.0
19     0.0     0.0     0.0     0.0      0.0
20     0.0     0.0     0.0     0.0      0.0
21     0.0     0.0     0.0     0.0      0.0
22     0.0     0.0     0.0     0.0      0.0
23     0.0     0.0     0.0     0.0      0.0

EDIT:

As @Parfait mentioned, it is not good to use pandas.DataFrame.append() in for loop, because it leads to quadratic copying. To avoid that, you can make a list of dictionaries (future data frame rows) and after that call pd.DataFrame() to make a data frame out of it. The code looks like this:

def checkHour(time, hour):
    val = 0
    if time == hour:
        val = 1 
    return val

data = []
hour = -1

for i in range(24):
    stuff = {}
    hour = hour + 1
    stuff['hour_6'] = checkHour(hour, 6)
    stuff['hour_7'] = checkHour(hour, 7)
    stuff['hour_8'] = checkHour(hour, 8)
    stuff['hour_9'] = checkHour(hour, 9)
    stuff['hour_10'] = checkHour(hour, 10)
    data.append(stuff)

df = pd.DataFrame(data)

And the result is following:

>>> print(df)
    hour_6  hour_7  hour_8  hour_9  hour_10
0        0       0       0       0        0
1        0       0       0       0        0
2        0       0       0       0        0
3        0       0       0       0        0
4        0       0       0       0        0
5        0       0       0       0        0
6        1       0       0       0        0
7        0       1       0       0        0
8        0       0       1       0        0
9        0       0       0       1        0
10       0       0       0       0        1
11       0       0       0       0        0
12       0       0       0       0        0
13       0       0       0       0        0
14       0       0       0       0        0
15       0       0       0       0        0
16       0       0       0       0        0
17       0       0       0       0        0
18       0       0       0       0        0
19       0       0       0       0        0
20       0       0       0       0        0
21       0       0       0       0        0
22       0       0       0       0        0
23       0       0       0       0        0
Sign up to request clarification or add additional context in comments.

3 Comments

Ok Thanks for tips
stuff = {} is a dictionary
1

Another really simple solution, how to create your data frame is to use pandas.get_dummies() function like this:

df = pd.DataFrame({'hour': range(24)})
df = pd.get_dummies(df.hour, prefix='hour')
df = df[['hour_6', 'hour_7', 'hour_8', 'hour_9', 'hour_10']]

2 Comments

Would that start at hour 0? I think I need hour to be 0 thru 23
@HenryHub, yes, it would. Function range(23) will start with 0 and ends with 23.
0

Quick glance for the blankness issue I'd say:

hour = -1
stuff = []

for i in range(24):    
    hour = hour + 1
    stuff.append(checkHour6(hour))
    stuff.append(checkHour7(hour))
    stuff.append(checkHour8(hour))
    stuff.append(checkHour9(hour))
    stuff.append(checkHour10(hour))

df.append(stuff)

May be a better solution to the whole process though.

1 Comment

Thanks for the help but that appears to create dataframe with 120 rows, I was hoping for df with 24 rows (to represent 24 hours in a day) where the columns would either be 1 or 0 depending on value of hour
0

start off with a data column (what hour is it) then all the other comparisons can be queried from that.

import pandas as pd
df = pd.DataFrame(range(24), columns= ['data'])
for time in range(6,11):
   df[f'hour_{time}'] = df['data']%24==time

df = df.astype(int)

If you want you can remove the data column later.

    data  hour_6  hour_7  hour_8  hour_9  hour_10
0      0       0       0       0       0        0
1      1       0       0       0       0        0
2      2       0       0       0       0        0
3      3       0       0       0       0        0
4      4       0       0       0       0        0
5      5       0       0       0       0        0
6      6       1       0       0       0        0
7      7       0       1       0       0        0
8      8       0       0       1       0        0
9      9       0       0       0       1        0
10    10       0       0       0       0        1
11    11       0       0       0       0        0
12    12       0       0       0       0        0
13    13       0       0       0       0        0
14    14       0       0       0       0        0
15    15       0       0       0       0        0
16    16       0       0       0       0        0
17    17       0       0       0       0        0
18    18       0       0       0       0        0
19    19       0       0       0       0        0
20    20       0       0       0       0        0
21    21       0       0       0       0        0
22    22       0       0       0       0        0
23    23       0       0       0       0        0

Comments

0

Because the object model in numpy and pandas differs from general Python, consider avoiding building objects in a loop like you would with simpler iterables like list or dict.

In fact, your setup can be handled with simply DataFrame.pivot with a column of 24 sequential integers without any function or loop! In fact, you can return more hour columns (i.e., hour_0-hour_24) easily or reindex for your needed five columns:

Data

df = (pd.DataFrame({'hour': ['hour' for _ in range(24)]})
        .assign(hour = lambda x: x['hour'] + '_' + pd.Series(range(24)).astype('str'),
                num = 1)
     )

df3.head(5)
#      hour  num
# 0  hour_0    1
# 1  hour_1    1
# 2  hour_2    1
# 3  hour_3    1
# 4  hour_4    1

Pivot

pvt_df = (df.pivot(columns='hour', values='num')
            .fillna(0)
            .reindex(['hour_6', 'hour_7', 'hour_8', 'hour_9', 'hour_10'], axis='columns')
         )

pvt_df
# hour  hour_6  hour_7  hour_8  hour_9  hour_10
# 0        0.0     0.0     0.0     0.0      0.0
# 1        0.0     0.0     0.0     0.0      0.0
# 2        0.0     0.0     0.0     0.0      0.0
# 3        0.0     0.0     0.0     0.0      0.0
# 4        0.0     0.0     0.0     0.0      0.0
# 5        0.0     0.0     0.0     0.0      0.0
# 6        1.0     0.0     0.0     0.0      0.0
# 7        0.0     1.0     0.0     0.0      0.0
# 8        0.0     0.0     1.0     0.0      0.0
# 9        0.0     0.0     0.0     1.0      0.0
# 10       0.0     0.0     0.0     0.0      1.0
# 11       0.0     0.0     0.0     0.0      0.0
# 12       0.0     0.0     0.0     0.0      0.0
# 13       0.0     0.0     0.0     0.0      0.0
# 14       0.0     0.0     0.0     0.0      0.0
# 15       0.0     0.0     0.0     0.0      0.0
# 16       0.0     0.0     0.0     0.0      0.0
# 17       0.0     0.0     0.0     0.0      0.0
# 18       0.0     0.0     0.0     0.0      0.0
# 19       0.0     0.0     0.0     0.0      0.0
# 20       0.0     0.0     0.0     0.0      0.0
# 21       0.0     0.0     0.0     0.0      0.0
# 22       0.0     0.0     0.0     0.0      0.0
# 23       0.0     0.0     0.0     0.0      0.0

3 Comments

@Parfait_ Would you be able to help me with this SO question? stackoverflow.com/questions/60759277/…
Interesting you remark on pivot table solution but do not acknowledge this solution works for you!
Sorry Im still learning! Still investigating what/how pivot table works. I think I can relate when using Microsoft Excel

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.