2

I have 15 csv files whose one of the column represents year. Problem is that the year column is named 'year' in some files and 'year_' in the other. So I have two columns that have the same information to me but since each file has only one of the column name(either year or year_), if row 1 has value in 'year', 'year_' has NaN. I want to combine those two columns so that I can get rid of NaN. What is the best way to do this?

Before

       year     year_
 1     NaN      1999
 2     2002     NaN
 3     2000     NaN
 .
 .
 .
 N     NaN      2004

I want this to be

After

       year
 1     1999
 2     2002
 3     2000
 .
 .
 .
 N     2004

3 Answers 3

4

You can use combine_first function.

df['YEAR'] = df['year'].combine_first(df['year_'])

where df['year'] will be default and df['year2'] will be used to fill null values.

Sign up to request clarification or add additional context in comments.

2 Comments

Seems to be faster than the sum solution.
@HirotakaNakagame: Glad that we could help. Please also upvote helpful answers and accept the one that helped you most to show others that your issue is solved (you can do so by clicking on the small check next to the answer which then turns green).
2

Given that only one has a valid value, you can simply sum them on axis 1

year_cols = df.columns[df.columns.str.contains('year')]
df['year'] = df[year_cols].sum(1)

Comments

0

Same idea as @Vaishali: you can just sum the year columns; use filter to select the columns:

df.filter(like='year').sum(axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.