vlookup in pandas between 2 dataframes to create third dataframe

Question

I am wanting to use the equivalent to excel’s vlookup for a new dataframe. I have 2 dataframes and am trying to v-lookup df1.Column A value against df2.Column A and B and getting Value A.

And the cell beside that is df1.Column A value against df2.Column A and B and getting value B.

Data looks like-

The data is in Columns A and B respectively for both data frames 1 and 2

Current ouput                                                                  
    Data frame 1              Dataframe2     

      AC1     AC2               AC10                 AC20                                
    Bus        5              car                      1                                    
    car        3              helicopter               7                                  
    Walking    2              running                  5

Desired/Expected output

           Dataframe [Neu]    

NaNa                       NaNa    
Car                           1     
NaNa                       NaNa

I have tried:

dfz = df1.insert(2, '2A2', df1['AC1'].map(df2.set_index('AC1')['2A2']))
print (dfz)

result = left.join(right, on=['AC2', 'AC1], how='inner')
#left.join(right, lsuffix='_l', rsuffix='_r')

#df1.join(df1.set_index('AC2')['AC1'], on='AC2')

I have had some success with:

df8 = df1['AC3'] = df1.AC1.map(df2.AC10)
print (df8)


df8 = df1['AC4'] = df1.AC1.map(df2.AC20)
print (df8)

The exact output is NaN so it's not correct.

Example:

df1 = pd.read_excel('C:/Users/Desktop/zav.xlsx')

df2 = pd.read_excel('C:/Users/Desktop/zav2.xlsx')

#df3 = pd.merge(df, df2)
df3 = df1.join(df2)
print (df3)


todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')

df5 = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),
                 columns=['a', 'b', 'c', 'd', 'e'])
print(df5)


df8 = df1['AC3'] = df1.AC1.map(df2.AC10)
print (df8)


df8 = df1['AC3'] = df1.AC1.map(df2.AC20)
print (df8)

@andrew_reece I included my desired/expected output in the question but you had to scroll to the right a lot. I will update it. Thanks — user8950775
– user8950775, Commented Nov 17, 2017 at 4:57
Did you try instead a merge with how='left'? Also, is this normal the case difference for car and Car? — hanego
– hanego, Commented Nov 17, 2017 at 5:36
@angelwally That's a typo. I tried dfza = df1['AC3']=df1.AC1.map(df2.AC10) print(dfza) . Same issue seems to be present — user8950775
– user8950775, Commented Nov 17, 2017 at 8:34
I've never used map before but looking at the docs, I guess you should set the index to the arguments of your function. In your case, df2 should have AC10 as index if you want to merge it on df1.AC1. — hanego
– hanego, Commented Nov 17, 2017 at 8:45

hanego · Accepted Answer · 2017-11-17 10:57:31Z

1

You can check the following code working with map:

import pandas as pd    

df1 = pd.DataFrame([["Bus",5],["car",3],["Walking",2]],columns=["AC1","AC2"])

df2 = pd.DataFrame([["car",1],["helicopter",7],["running", 5]],columns=["AC10","AC20"])

df2 = df2.groupby("AC10").first()

df3= df1.join(df2,on="AC1",how="left").drop("AC2",axis=1)

It will output the following:

       AC1  AC20
0      Bus   NaN
1      car   1.0
2  Walking   NaN

edited Nov 17, 2017 at 10:57

answered Nov 17, 2017 at 8:57

hanego

1,6351 gold badge16 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user8950775 Over a year ago

Is there a way to adjust that for - df1 = pd.Series(["AC2"],index=["AC1"]) df2 = pd.Series(["AC20"],index=["AC10"]) I have about 100 cells of data and doing it that way might take a while. e.g pd.Series([usecols=1],index=[cols=2])

hanego Over a year ago

I am not sure to understand what you are trying to do here. Could you upload somewhere your excel, with the vlookup to see what you are trying to achieve?

user8950775 Over a year ago

Sure. I've uploaded an Excel equivalent of what I am after dropfile.to/OKBgRVd

hanego Over a year ago

Based on your file, I would do first df2 = df2.groupby("col1").first() to take the first entry of df2. Then, df3= df1.join(df2,on="col1",how="left"). You can drop the original second column if you don't need it. Is that what you were looking for?

user8950775 Over a year ago

Nearly works. print (df2.columns.tolist()) dfzaa = df1.groupby("AC1").first() df3= df2.join(df1,on="AC20",how="left") print(df3). Hmmz...

Collectives™ on Stack Overflow

vlookup in pandas between 2 dataframes to create third dataframe

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related