1

I have two dataframes as described above

I would like to create in the second table an additional feature (Col_to_create) related to the value of feature A.

Table 2 has more than 800 000 samples so that I ask for a faster way to do that.

First table:

a      b    
1     100
2     400
3     500

Second table:

id   Refer_to_A     Col_to_create
0        3               500
1        1               100
2        3               500
3        2               400
4        1               100
2
  • Are you supposed to optimize a join Commented Dec 4, 2019 at 16:53
  • I didn t understand your question Commented Dec 4, 2019 at 16:58

2 Answers 2

3

You can use the method map:

df2['Col_to_create'] = df2['Refer_to_A'].map(df1.set_index('a')['b'])

Output:

    Refer_to_A  Col_to_create
id                           
0            3            500
1            1            100
2            3            500
3            2            400
4            1            100
Sign up to request clarification or add additional context in comments.

Comments

2

One possible way is you can apply the function on new column of the dataset :

If your dataset is :

dataframe_a = pd.DataFrame({'a': [1,2,3], 'b': [100,400,500]})
dataframe_b = pd.DataFrame({'Refer_to_A': [3,1,3,2,1]})

You can try something like :

dataframe_b['Col_to_create'] = dataframe_b['Refer_to_A'].apply(lambda col: dataframe_a['b'][col-1])

output:

   Refer_to_A  Col_to_create
0           3            500
1           1            100
2           3            500
3           2            400
4           1            100

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.