How to fill a pandas dataframe column using a value from another dataframe column

Question

Firstly we can import some packages which might be useful

import pandas as pd
import datetime

Say I now have a dataframe which has a date, name and age column.

df1 = pd.DataFrame({'date': ['10-04-2020', '04-07-2019', '12-05-2015' ], 'name': ['john', 'tim', 'sam'], 'age':[20, 22, 27]})

Now say I have another dataframe with some random columns

df2 = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

Question:

How can I take the age value in df1 filtered on the date (can select this value) and populate a whole new column in df2 with this value? Ideally this method should generalise for any number of rows in the dataframe.

Tried

The following is what I have tried (on a similar example) but for some reason it doesn't seem to work (it just shows nan values in the majority of column entries except for a few which randomly seem to populate).

y = datetime.datetime(2015, 5, 12)
df2['new'] = df1[(df1['date'] == y)].age

Expected Output

Since I have filtered above based on sams age (date corresponds to the row with sams name) I would like the new column to be added to df2 with his age as all the entries (in this case 27 repeated 3 times).

df2 = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6], 'new': [27, 27, 27]})

what if there are two sam one with age 27 and the other with 25? — Joe Ferndz
– Joe Ferndz, Commented Mar 30, 2021 at 8:28
Ah, sorry I just mentioned sams name to make it easier (seems to have complicated things). Ignore that, think of it as filtering by the date which will always be unique, and then selecting the age based of that. So ideally if I change the date specified it should pick out the age number (from df1) and then populate a new column (in df2) with that value — Curious
– Curious, Commented Mar 30, 2021 at 8:32

Laurent · Accepted Answer · 2021-03-30 09:12:58Z

1

Try:

y = datetime.datetime(2015, 5, 12).strftime('%d-%m-%Y')
df2.loc[:, 'new'] = df1.loc[df1['date'] == y, "age"].item()

# Output
   a  b  new
0  1  4   27
1  2  5   27
2  3  6   27

edited Mar 30, 2021 at 9:12

answered Mar 30, 2021 at 8:25

Laurent

13.7k7 gold badges30 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Curious Over a year ago

Not sure if that works, seems to throw an error.

Laurent Over a year ago

That's because dates in df1 are strings, whereas y is a datetime64 object. Try again with the code in my updated answer.

Curious Over a year ago

Would you happen to know why this works and my original method wouldn't? Like if I changed y to a string and then tried using df2['new'] = df1[(df1['date'] == y)].age is there something wrong with this?

Vishnudev Krishnadas Over a year ago

Your date column is not of datetime type, rather it is just string type. So when you compare string to datetime object you get an error @nishcs. But this answer does just a string to string comparison and hence it works.

Curious Over a year ago

but say I turn my datetime into a string using y = datetime.datetime(2015, 5, 12).strftime('%d-%m-%Y') then is my command df2['new'] = df1[(df1['date'] == y)].age still a valid solution?

|

Sure13 · Accepted Answer · 2021-03-30 09:09:01Z

1

You'd like to change format of y to Str and try df.loc method

y = datetime.datetime(2015, 5, 12)

y=y.strftime('%d-%m-%Y')
df2['new']=int(df1.loc[df1['date']==y,'age'].values)
df2

answered Mar 30, 2021 at 9:09

Sure13

413 bronze badges

Comments

Vishnudev Krishnadas · Accepted Answer · 2021-03-30 09:24:39Z

1

Convert df1 date column to datetime type

df1['date'] = pd.to_datetime(df1.date, format='%d-%m-%Y')

Filter dataframe and get the age

req_date = '2015-05-12'
age_for_date = df1.query('date == @req_date').age.iloc[0]

NOTE: This assumes that there is only one age per date (As explained by OP in comments)

Create a new column

df2 = df2.assign(new=age_for_date)

Output

   a  b  new
0  1  4   27
1  2  5   27
2  3  6   27

answered Mar 30, 2021 at 9:24

Vishnudev Krishnadas

11k2 gold badges29 silver badges58 bronze badges

2 Comments

Curious Over a year ago

Is the assign method better for creating new columns compared to equating?

Vishnudev Krishnadas Over a year ago

Good question @nishcs. I use assign because of its structural simplicity and flexibility in assigning multiple columns using a dictionary. In terms of time efficiency, assign is a tiny bit slower as it does a copy after assigning. For multiple column creation and readability, I prefer it over the simple assignment.

Collectives™ on Stack Overflow

How to fill a pandas dataframe column using a value from another dataframe column

3 Answers 3

6 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related