1

I have two dataframes df and times, representing maintenance records and monthly times, respectively. I'd like to append a column to times based on the data in df:

#df represents car maintenance records
data = {"07-18-2012": ["replaced wheels", 45, 200], "09-12-2014": ["changed oil", 30, 40], "09-18-2015": ["fixed dent", 92, 0]}
df = pd.DataFrame.from_dict(data, orient = "index")
df.index = pd.to_datetime(df.index)
df.sort_index(inplace = True)
df.columns = ["description", "mins_spent", "cost"]

#times represents monthly periods
rng = pd.date_range(start = '12/31/2013', end = '1/1/2015', freq='M')
ts = pd.Series(rng)
times = ts.to_frame(name = "months")

I'm trying to add a new column called days_since_maintenance to times, that represents the number of days since the most recent maintenance occurring from df

I've tried using df.ix[], iterating over for loop, and searchsorted().

df:

                description  mins_spent  cost
2012-07-18  replaced wheels          45   200
2014-09-12      changed oil          30    40
2015-09-18       fixed dent          92     0

times:

   months
0  2013-12-31
1  2014-01-31
2  2014-02-28
3  2014-03-31
4  2014-04-30
5  2014-05-31
6  2014-06-30
7  2014-07-31
8  2014-08-31
9  2014-09-30
10 2014-10-31
11 2014-11-30
12 2014-12-31

Desired DataFrame:

   months       days_since_maintenance
0  2013-12-31   531 days
1  2014-01-31   562 days
2  2014-02-28   ...
3  2014-03-31   ...
4  2014-04-30   ...
5  2014-05-31   ...
6  2014-06-30   ...
7  2014-07-31   ...
8  2014-08-31   774 days
9  2014-09-30   18 days
10 2014-10-31   ...
11 2014-11-30   ...
12 2014-12-31   ...
0

3 Answers 3

1
df['dates'] = df.index

def days_from_closest(x, df):
    closest = df[df['dates'] < x].ix[-1]
    return x - closest.dates

times['days_since_maintenance'] = times['months'].apply(lambda x: days_from_closest(x, df))

       months  days_since_maintenance
0  2013-12-31                531 days
1  2014-01-31                562 days
2  2014-02-28                590 days
3  2014-03-31                621 days
4  2014-04-30                651 days
5  2014-05-31                682 days
6  2014-06-30                712 days
7  2014-07-31                743 days
8  2014-08-31                774 days
9  2014-09-30                 18 days
10 2014-10-31                 49 days
11 2014-11-30                 79 days
12 2014-12-31                110 days

[13 rows x 2 columns]

Sign up to request clarification or add additional context in comments.

3 Comments

oh i see you need to actually look up the value. i can modify. but apply might be your friend here
this works -- not sure whether it's more efficient or less efficient than MaxU answer, which also works. thanks!
It likely depends on the length of the list of dates. My assumption would be apply would be more efficient on a longer list.... maybe someone else knows though
1

apply works, but utilize index rather than appending a new column for dates:

def days_since_x(row, df):
    '''returns days between the row date
     and the most recent maintenance date in df'''

    #filter records
    all_maint_prior = df[(df.index <= row)]

    if all_maint_prior.empty:
        return float('NaN')

    else:
        #get last row of filtered results
        most_recent = all_maint_prior.iloc[-1]

        #return difference in dates
        return row-most_recent.name

times["days_since_maintenance"] = times["months"].apply(lambda row: days_since_x (row,df))

Comments

0

well, it's definitely not the best solution, because it loops through your df.index:

for d in df.index:
    times.ix[times['months'] >= d, 'days_since_maintenance'] = times['months'] - d

In [123]: times
Out[123]:
       months  days_since_maintenance
0  2013-12-31                531 days
1  2014-01-31                562 days
2  2014-02-28                590 days
3  2014-03-31                621 days
4  2014-04-30                651 days
5  2014-05-31                682 days
6  2014-06-30                712 days
7  2014-07-31                743 days
8  2014-08-31                774 days
9  2014-09-30                 18 days
10 2014-10-31                 49 days
11 2014-11-30                 79 days
12 2014-12-31                110 days

1 Comment

this works -- not sure whether it's more efficient or less efficient than GMarsh answer, which also works. thanks!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.