1

For the below DataFrame:

ID Reg Departure Arrival Date Time
1 ABC DUB LCY 22/0/22 15:23
2 ABC DUB LCY 22/0/22 15:27
3 CBA CPH HEL 22/0/22 12:21
4 CBA CPH HE 22/0/22 12:19

I would like to pivot so that it so that the resulting DataFrame is:

ID Reg Departure Arrival Date Time ID2 Time2
1 ABC DUB LCY 22/0/22 15:23 2 15:27
3 CBA CPH HEL 22/0/22 12:21 4 12:19

I've been playing around with GroupBy & Pivot but haven't managed to crack it, any help is appreciated!

2
  • Please go into more detail about the assumptions we can make. What if for the same Reg the Departure would differ? Or the Arrival? What's fixed for combinations of certain columns and what's not? Commented Mar 1, 2022 at 10:59
  • Apologies I should have mentioned. The assumption is that Reg, Departure, Arrival & Date should have one duplicate row, so i'm trying to merge those two, and keep the ID & Time cols (as they won't be duplicates) as new columns. I hope this makes sense. Commented Mar 1, 2022 at 11:05

2 Answers 2

2

So this will work if you have just two of each Reg row. But if you have more duplicate rows you will need something more robust.

import data and pandas

import pandas as pd

output

    ID  Reg Departure   Arrival Date    Time
0   1   ABC DUB         LCY     22/0/22 15:23
1   2   ABC DUB         LCY     22/0/22 15:27
2   3   CBA CPH         HEL     22/0/22 12:21
3   4   CBA CPH         HE      22/0/22 12:19

make two frames with with 'firsts' and one with 'lasts'

final_df = df.drop_duplicates(keep='first', subset='Departure')
id_time_df = df.drop_duplicates(keep='last', subset='Departure')

Then merge the two frames

pd.merge(final_df, id_time_df[['ID', 'Reg', 'Time']], on='Reg')

output

    ID_x    Reg Departure   Arrival Date    Time_x  ID_y    Time_y
0   1       ABC DUB         LCY     22/0/22 15:23   2       15:27
1   3       CBA CPH         HEL     22/0/22 12:21   4       12:19
Sign up to request clarification or add additional context in comments.

Comments

1

Initialize the dataframe

import pandas as pd
import io
str = '''ID Reg Departure   Arrival Date    Time
1   ABC DUB LCY 22/0/22 15:23
2   ABC DUB LCY 22/0/22 15:27
3   CBA CPH HEL 22/0/22 12:21
4   CBA CPH HEL 22/0/22 12:19'''

df = pd.read_csv(io.StringIO(str), sep = '\s+')

We first create a sequence number by group for unstack() later .

df["grpseq"] = df.groupby(["Reg", "Departure", "Arrival", "Date"]).cumcount()

df_new = df.set_index(["Reg", "Departure", "Arrival", "Date", "grpseq"]).unstack(level = -1).reset_index()
df_new.columns = ["Reg", "Departure", "Arrival", "Date", "ID1", "ID2", "Time1", "Time2"]

df_new
    Reg  Departure  Arrival    Date    ID1  ID2 Time1   Time2
0   ABC      DUB       LCY     22/0/22  1   2   15:23   15:27
1   CBA      CPH       HEL     22/0/22  3   4   12:21   12:19

Where ID1 represent the ID column in your desire table. And a simple reorder of the columns will be done.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.