0

I am wanting to use the equivalent to excel’s vlookup for a new dataframe. I have 2 dataframes and am trying to v-lookup df1.Column A value against df2.Column A and B and getting Value A.

And the cell beside that is df1.Column A value against df2.Column A and B and getting value B.

Data looks like-

The data is in Columns A and B respectively for both data frames 1 and 2

Current ouput                                                                  
    Data frame 1              Dataframe2     

      AC1     AC2               AC10                 AC20                                
    Bus        5              car                      1                                    
    car        3              helicopter               7                                  
    Walking    2              running                  5                                  

Desired/Expected output

           Dataframe [Neu]    

NaNa                       NaNa    
Car                           1     
NaNa                       NaNa

I have tried:

dfz = df1.insert(2, '2A2', df1['AC1'].map(df2.set_index('AC1')['2A2']))
print (dfz)

result = left.join(right, on=['AC2', 'AC1], how='inner')
#left.join(right, lsuffix='_l', rsuffix='_r')

#df1.join(df1.set_index('AC2')['AC1'], on='AC2')

I have had some success with:

df8 = df1['AC3'] = df1.AC1.map(df2.AC10)
print (df8)


df8 = df1['AC4'] = df1.AC1.map(df2.AC20)
print (df8)

The exact output is NaN so it's not correct.

Example:

df1 = pd.read_excel('C:/Users/Desktop/zav.xlsx')

df2 = pd.read_excel('C:/Users/Desktop/zav2.xlsx')

#df3 = pd.merge(df, df2)
df3 = df1.join(df2)
print (df3)


todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')

df5 = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),
                 columns=['a', 'b', 'c', 'd', 'e'])
print(df5)


df8 = df1['AC3'] = df1.AC1.map(df2.AC10)
print (df8)


df8 = df1['AC3'] = df1.AC1.map(df2.AC20)
print (df8)
6
  • What is your expected output? Commented Nov 17, 2017 at 3:40
  • @andrew_reece I included my desired/expected output in the question but you had to scroll to the right a lot. I will update it. Thanks Commented Nov 17, 2017 at 4:57
  • Did you try instead a merge with how='left'? Also, is this normal the case difference for car and Car? Commented Nov 17, 2017 at 5:36
  • @angelwally That's a typo. I tried dfza = df1['AC3']=df1.AC1.map(df2.AC10) print(dfza) . Same issue seems to be present Commented Nov 17, 2017 at 8:34
  • I've never used map before but looking at the docs, I guess you should set the index to the arguments of your function. In your case, df2 should have AC10 as index if you want to merge it on df1.AC1. Commented Nov 17, 2017 at 8:45

1 Answer 1

1

You can check the following code working with map:

import pandas as pd    

df1 = pd.DataFrame([["Bus",5],["car",3],["Walking",2]],columns=["AC1","AC2"])

df2 = pd.DataFrame([["car",1],["helicopter",7],["running", 5]],columns=["AC10","AC20"])

df2 = df2.groupby("AC10").first()

df3= df1.join(df2,on="AC1",how="left").drop("AC2",axis=1)

It will output the following:

       AC1  AC20
0      Bus   NaN
1      car   1.0
2  Walking   NaN
Sign up to request clarification or add additional context in comments.

5 Comments

Is there a way to adjust that for - df1 = pd.Series(["AC2"],index=["AC1"]) df2 = pd.Series(["AC20"],index=["AC10"]) I have about 100 cells of data and doing it that way might take a while. e.g pd.Series([usecols=1],index=[cols=2])
I am not sure to understand what you are trying to do here. Could you upload somewhere your excel, with the vlookup to see what you are trying to achieve?
Sure. I've uploaded an Excel equivalent of what I am after dropfile.to/OKBgRVd
Based on your file, I would do first df2 = df2.groupby("col1").first() to take the first entry of df2. Then, df3= df1.join(df2,on="col1",how="left"). You can drop the original second column if you don't need it. Is that what you were looking for?
Nearly works. print (df2.columns.tolist()) dfzaa = df1.groupby("AC1").first() df3= df2.join(df1,on="AC20",how="left") print(df3). Hmmz...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.