1

On start I have two DataFrames and one variable:

id=1
df1 = pd.DataFrame({'id': [1, 2], 'col0': [3, 4]})
df2 = pd.DataFrame({'col1': [13, 14, 15],'col2': [23, 24, 25]})

I have to map id variable and the corresponding col0 cell from df1 DataFrame to all rows in df2 DataFrame. I tryed and as the result I made the code below:

df2.insert(0, "id", id)
df2.insert(1, "col0", df1[df1['id']==id]['col0'])

It seems to me that the code should work correctly, but unfortunatelly I have a NaN value in the col0 column.

   id  col0  col1  col2
0   1   3.0    13    23
1   1   NaN    14    24
2   1   NaN    15    25

The expected result was:

   id  col0  col1  col2
0   1   3.0    13    23
1   1   3.0    14    24
2   1   3.0    15    25

I've spent over an hour and can't figure out why I'm getting this kind of result. If possible, could you, please:

  1. explain briefly why I am getting the error
  2. fix my mistake in the code
9
  • 1
    could you please try filling or deleting those NAN values. Commented Jul 2, 2021 at 18:28
  • 1
    is the problem that 3 should be filled in for the NaN's? Could you please post the expected result? Commented Jul 2, 2021 at 18:32
  • @Uttam i can, operators df2.at[1,'col0']=3 df2.at[2,'col0']=3 fill and work correctly NaN values and work correctly Commented Jul 2, 2021 at 18:36
  • 1
    awesome. Hopefully my answer helps you out. Commented Jul 2, 2021 at 18:42
  • 1
    @Ivan7 Please let me know following answer helps Commented Jul 2, 2021 at 18:54

3 Answers 3

1

Your mistake is on this string df1[df1['id']==id]['col0'] when you use this, it returns a Series type. Yes it just have a value, but is still a Series with just one value.

To solve this issue is very very very simple, you just have to call the first item at the Series object like this: df1[df1['id']==id]['col0'][0]

Your code with the ajustment must look like this

import pandas as pd

id=1
df1 = pd.DataFrame({'id': [1, 2], 'col0': [3, 4]})
df2 = pd.DataFrame({'col1': [13, 14, 15],'col2': [23, 24, 25]})

df2.insert(0, "id", id)
df2.insert(1, "col0", df1[df1['id']==id]['col0'][0])

print(df2)

Then your new df2 is like this:

   id  col0  col1  col2
0   1     3    13    23
1   1     3    14    24
2   1     3    15    25
Sign up to request clarification or add additional context in comments.

Comments

1

The issue is that df1[df1['id']==id]['col0'] returns a series with one index value.

0    3
Name: col0, dtype: int64

When you insert this, it is matching index 0 in your original df, and therefore not filling in all the values.

In order to get 3 to fill for all values, try adding .to_numpy()[0] to the end of df1[df1['id']==id]['col0']. This will return just 3 without the index, and should return no NaN values.

3 Comments

@ rhug123 I am sorry but it seems to me that df2.insert (1, "col0", df1 [df1 ['id'] == id] ['col0']. to_numpy [0]) does not work for me - I get an error AttributeError: 'Series' object has no attribute 'to_numpy'. I have upgraded pandas (pip install --upgrade pandas) but I still get this error
@ rhug123 could you, please, comment?
1

The reason seems df1[df1['id'] == id]['col0'] is of type Series with only one element (3), so it populates the first row with it and rest comes out as NaN. But if you make the code as follows then you will get the expected result.

import pandas as pd

id=1
df1 = pd.DataFrame({'id': [1, 2], 'col0': [3, 4]})
df2 = pd.DataFrame({'col1': [13, 14, 15],'col2': [23, 24, 25]})
print(df2.head(5))
df2.insert(0, "id", id)
col_ = df1[df1['id'] == id]['col0']
df2.insert(1, "col0", col_[0])
print(df2.head(5))

Please let me know if it helps enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.