Python: Pandas Pivot DataFrame

Question

For the below DataFrame:

ID	Reg	Departure	Arrival	Date	Time
1	ABC	DUB	LCY	22/0/22	15:23
2	ABC	DUB	LCY	22/0/22	15:27
3	CBA	CPH	HEL	22/0/22	12:21
4	CBA	CPH	HE	22/0/22	12:19

I would like to pivot so that it so that the resulting DataFrame is:

ID	Reg	Departure	Arrival	Date	Time	ID2	Time2
1	ABC	DUB	LCY	22/0/22	15:23	2	15:27
3	CBA	CPH	HEL	22/0/22	12:21	4	12:19

I've been playing around with GroupBy & Pivot but haven't managed to crack it, any help is appreciated!

Please go into more detail about the assumptions we can make. What if for the same Reg the Departure would differ? Or the Arrival? What's fixed for combinations of certain columns and what's not? — timgeb
– timgeb, Commented Mar 1, 2022 at 10:59
Apologies I should have mentioned. The assumption is that Reg, Departure, Arrival & Date should have one duplicate row, so i'm trying to merge those two, and keep the ID & Time cols (as they won't be duplicates) as new columns. I hope this makes sense. — Ronnie_drew
– Ronnie_drew, Commented Mar 1, 2022 at 11:05

Cam · Accepted Answer · 2022-03-01 12:07:29Z

So this will work if you have just two of each Reg row. But if you have more duplicate rows you will need something more robust.

import data and pandas

import pandas as pd

output

    ID  Reg Departure   Arrival Date    Time
0   1   ABC DUB         LCY     22/0/22 15:23
1   2   ABC DUB         LCY     22/0/22 15:27
2   3   CBA CPH         HEL     22/0/22 12:21
3   4   CBA CPH         HE      22/0/22 12:19

make two frames with with 'firsts' and one with 'lasts'

final_df = df.drop_duplicates(keep='first', subset='Departure')
id_time_df = df.drop_duplicates(keep='last', subset='Departure')

Then merge the two frames

pd.merge(final_df, id_time_df[['ID', 'Reg', 'Time']], on='Reg')

output

    ID_x    Reg Departure   Arrival Date    Time_x  ID_y    Time_y
0   1       ABC DUB         LCY     22/0/22 15:23   2       15:27
1   3       CBA CPH         HEL     22/0/22 12:21   4       12:19

Denny Chen · Accepted Answer · 2022-03-02 02:05:08Z

Initialize the dataframe

import pandas as pd
import io
str = '''ID Reg Departure   Arrival Date    Time
1   ABC DUB LCY 22/0/22 15:23
2   ABC DUB LCY 22/0/22 15:27
3   CBA CPH HEL 22/0/22 12:21
4   CBA CPH HEL 22/0/22 12:19'''

df = pd.read_csv(io.StringIO(str), sep = '\s+')

We first create a sequence number by group for unstack() later .

df["grpseq"] = df.groupby(["Reg", "Departure", "Arrival", "Date"]).cumcount()

df_new = df.set_index(["Reg", "Departure", "Arrival", "Date", "grpseq"]).unstack(level = -1).reset_index()
df_new.columns = ["Reg", "Departure", "Arrival", "Date", "ID1", "ID2", "Time1", "Time2"]

df_new
    Reg  Departure  Arrival    Date    ID1  ID2 Time1   Time2
0   ABC      DUB       LCY     22/0/22  1   2   15:23   15:27
1   CBA      CPH       HEL     22/0/22  3   4   12:21   12:19

Where ID1 represent the ID column in your desire table. And a simple reorder of the columns will be done.

Collectives™ on Stack Overflow

Python: Pandas Pivot DataFrame

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related