Setting a date column in python pandas

Question

Currently I have a table that looks like this:

I'm trying to give each month in the year column a year value (e.g. September = 09-2011, December = 12-2011, March = 03-2012 etc.) I'm completely stumped on how to do this as i'm new to working with pandas. Does anyone have any pointers on how to manage this using pandas?

smj · Accepted Answer · 2018-09-17 01:31:00Z

1

Firstly, when you ask a question, please don't include images of a dataframe, instead include reproducible data. Take a look at this to get some pointers about how to write a good question.

To you question, firstly, look at the source of your table. Is it in Excel for example, could you fix the problem there?

If you do need to fix the problem using pandas, here is one way:

First some sample data, with years and months mixed up in the same column.

import pandas as pd
import numpy as np

data = pd.DataFrame({
    'key': ['2017', 'November', 'December', '2018', 'January']
})

First step is to extract the instance that are years into a new columns, and then "forward fill" to broadcast those values forward. In one line:

data['years'] = pd.Series([i if i.isnumeric() else np.nan for i in data['key']]).fillna(method = 'ffill')

Now, drop the rows that are years. In your data, it appears these have no data associated.

data = data[~data['key'].str.isnumeric()]

Giving us:

        key years
1  November  2017
2  December  2017
4   January  2018

answered Sep 17, 2018 at 1:31

smj

1,2841 gold badge7 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Abjilla22 Over a year ago

This worked perfectly, thankyou very much! Apologies for the question formatting, noted for next time :)

Community · Accepted Answer · 2020-06-20 09:12:55Z

0

Setup (reproducible example)

df = pd.DataFrame({'col1': [np.nan,2,3,np.nan,5,6,7], 'col2': [np.nan,20,30,np.nan,50,60,70]}, index=[2011,'September', 'December', 2012, 'March','June','April'])

            col1    col2
2011        NaN     NaN
September   2.0     20.0
December    3.0     30.0
2012        NaN     NaN
March       5.0     50.0
June        6.0     60.0
April       7.0     70.0

Can do

m = pd.Series([str(x) if type(x)==int else np.nan for x in df.index])
df.index = m.ffill().astype(str) + ' ' + df.index.astype(str)

df.loc[~df.isnull().all(1),:]

                col1    col2
2011 September  2.0     20.0
2011 December   3.0     30.0
2012 March      5.0     50.0
2012 June       6.0     60.0
2012 April      7.0     70.0

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Sep 17, 2018 at 1:36

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Collectives™ on Stack Overflow

Setting a date column in python pandas

2 Answers 2

1 Comment

Setup (reproducible example)

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Setup (reproducible example)

Comments

Your Answer

Sign up or log in

Post as a guest

Related