0

I'm learning python and I would like to populate one dataframe by getting data from another. If I use excel I would use VLOOKUP, although I know I can use pandas with python, i don't now how. Basucally, I have two dataframes:

df1.csv

Time           07:03:52
EmployeeID     98766
EmployeeName   "John"

Time           08:03:52
EmployeeID     98765
EmployeeName   "Mary"

df2.csv

Time   EmployeeID   EmployeeName

I would like to create a third dataframe from df2.csv like this:

df3.csv

EmployeeName  EmployeeID   Time
John          98766        07:03:52
Mary          98765        08:03:52

1 Answer 1

1

I think you need first reshape to rows by cumcount + set_index + unstack and then if need change ordering of columns use reindex:

df1 = pd.read_csv('df1.csv', names=['a','b'])
print (df1)
              a         b
0          Time  07:03:52
1    EmployeeID     98766
2  EmployeeName      Joao
3          Time  08:03:52
4    EmployeeID     98765
5  EmployeeName      Mary

#for columns names created from file2
df2 = pd.read_csv('df2.csv')
c = df2.columns.str.strip().tolist()
print (c)
['EmployeeID', 'EmployeeName', 'Time']

#or defined in list
#c = ['Time', 'EmployeeID', 'EmployeeName']

g = df1.groupby('a').cumcount()
df1 = df1.set_index([g,'a'])['b'].unstack().reindex(columns=c)
print (df1)
a EmployeeID EmployeeName      Time
0      98766         Joao  07:03:52
1      98765         Mary  08:03:52
Sign up to request clarification or add additional context in comments.

10 Comments

Thanks a lot! It worked! The only problem is that df3 is like this: EmployeeID EmployeeName Time EmployeeName Time 0 98766 Joao 07:03:52 NaN NaN 1 98765 Mary 08:03:52 NaN NaN
Can you explain more creating df3 from df2 ? the best is ass some sample data to df2 and add expected output to df3. Or df2 have no values, only header?
Yes, df2 has only header. Basically, I want to use the information of df1 to paste at the exact column of df2, creating a new file.csv (df3)
No , it returns that: Traceback (most recent call last): File "test_url.py", line 372, in <module> g = df.groupby('a').cumcount() NameError: name 'df' is not defined
It returned: a EmployeeID EmployeeName Time 0 98766 NaN NaN 1 98765 NaN NaN I mean: its not getting the information from Employee Name and Time.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.