0

Currently I have a table that looks like this: Currently I have a table that looks like this

I'm trying to give each month in the year column a year value (e.g. September = 09-2011, December = 12-2011, March = 03-2012 etc.) I'm completely stumped on how to do this as i'm new to working with pandas. Does anyone have any pointers on how to manage this using pandas?

2 Answers 2

1

Firstly, when you ask a question, please don't include images of a dataframe, instead include reproducible data. Take a look at this to get some pointers about how to write a good question.

To you question, firstly, look at the source of your table. Is it in Excel for example, could you fix the problem there?

If you do need to fix the problem using pandas, here is one way:

First some sample data, with years and months mixed up in the same column.

import pandas as pd
import numpy as np

data = pd.DataFrame({
    'key': ['2017', 'November', 'December', '2018', 'January']
})

First step is to extract the instance that are years into a new columns, and then "forward fill" to broadcast those values forward. In one line:

data['years'] = pd.Series([i if i.isnumeric() else np.nan for i in data['key']]).fillna(method = 'ffill')

Now, drop the rows that are years. In your data, it appears these have no data associated.

data = data[~data['key'].str.isnumeric()]

Giving us:

        key years
1  November  2017
2  December  2017
4   January  2018
Sign up to request clarification or add additional context in comments.

1 Comment

This worked perfectly, thankyou very much! Apologies for the question formatting, noted for next time :)
0

Setup (reproducible example)

df = pd.DataFrame({'col1': [np.nan,2,3,np.nan,5,6,7], 'col2': [np.nan,20,30,np.nan,50,60,70]}, index=[2011,'September', 'December', 2012, 'March','June','April'])

            col1    col2
2011        NaN     NaN
September   2.0     20.0
December    3.0     30.0
2012        NaN     NaN
March       5.0     50.0
June        6.0     60.0
April       7.0     70.0

Can do

m = pd.Series([str(x) if type(x)==int else np.nan for x in df.index])
df.index = m.ffill().astype(str) + ' ' + df.index.astype(str)

df.loc[~df.isnull().all(1),:]

                col1    col2
2011 September  2.0     20.0
2011 December   3.0     30.0
2012 March      5.0     50.0
2012 June       6.0     60.0
2012 April      7.0     70.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.