How to use VLOOKUP - using pandas, I guess - with python?

Question

I'm learning python and I would like to populate one dataframe by getting data from another. If I use excel I would use VLOOKUP, although I know I can use pandas with python, i don't now how. Basucally, I have two dataframes:

df1.csv

Time           07:03:52
EmployeeID     98766
EmployeeName   "John"

Time           08:03:52
EmployeeID     98765
EmployeeName   "Mary"

df2.csv

Time   EmployeeID   EmployeeName

I would like to create a third dataframe from df2.csv like this:

df3.csv

EmployeeName  EmployeeID   Time
John          98766        07:03:52
Mary          98765        08:03:52

jezrael · Accepted Answer · 2018-01-20 16:30:51Z

1

I think you need first reshape to rows by cumcount + set_index + unstack and then if need change ordering of columns use reindex:

df1 = pd.read_csv('df1.csv', names=['a','b'])
print (df1)
              a         b
0          Time  07:03:52
1    EmployeeID     98766
2  EmployeeName      Joao
3          Time  08:03:52
4    EmployeeID     98765
5  EmployeeName      Mary

#for columns names created from file2
df2 = pd.read_csv('df2.csv')
c = df2.columns.str.strip().tolist()
print (c)
['EmployeeID', 'EmployeeName', 'Time']

#or defined in list
#c = ['Time', 'EmployeeID', 'EmployeeName']

g = df1.groupby('a').cumcount()
df1 = df1.set_index([g,'a'])['b'].unstack().reindex(columns=c)
print (df1)
a EmployeeID EmployeeName      Time
0      98766         Joao  07:03:52
1      98765         Mary  08:03:52

edited Jan 20, 2018 at 16:30

answered Jan 20, 2018 at 14:08

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

rafasalo Over a year ago

Thanks a lot! It worked! The only problem is that df3 is like this: EmployeeID EmployeeName Time EmployeeName Time 0 98766 Joao 07:03:52 NaN NaN 1 98765 Mary 08:03:52 NaN NaN

jezrael Over a year ago

Can you explain more creating df3 from df2 ? the best is ass some sample data to df2 and add expected output to df3. Or df2 have no values, only header?

rafasalo Over a year ago

Yes, df2 has only header. Basically, I want to use the information of df1 to paste at the exact column of df2, creating a new file.csv (df3)

rafasalo Over a year ago

No , it returns that: Traceback (most recent call last): File "test_url.py", line 372, in <module> g = df.groupby('a').cumcount() NameError: name 'df' is not defined

rafasalo Over a year ago

It returned:

a EmployeeID   EmployeeName   Time 0      98766            NaN                 NaN 1      98765            NaN                 NaN

I mean: its not getting the information from Employee Name and Time.

|

Collectives™ on Stack Overflow

How to use VLOOKUP - using pandas, I guess - with python?

1 Answer 1

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related