1

I want to reorganize some data in a Pandas DataFrame from an existing DataFrame such that I can have a subset of row values (in the minimal example below 'City' as columns, and remove rows not having either of the chosen values, below 'New York' and 'Amsterdam').

To illustrate what I mean by the above, I have a DataFrame that looks like this:

# Id |  Start_Time                       | End_Time                           | City       | Price
--------------------------------------
1 | 2022-01-01 00:00:00.0000000 +01:00 | 2022-01-01 01:00:00.0000000 +01:00 | New York   | 100
2 | 2022-01-01 01:00:00.0000000 +01:00 | 2022-01-01 02:00:00.0000000 +01:00 | New York   | 90
3 | 2022-01-01 02:00:00.0000000 +01:00 | 2022-01-01 03:00:00.0000000 +01:00 | New York   | 95
4 | 2022-01-01 01:00:00.0000000 +01:00 | 2022-01-01 02:00:00.0000000 +01:00 | Amsterdam  | 300
5 | 2022-01-01 02:00:00.0000000 +01:00 | 2022-01-01 03:00:00.0000000 +01:00 | Amsterdam  | 250

I would like to create a DataFrame that looks like this:

# Period                                                                | New York | Amsterdam | Difference
------------------------------------------------------------------------------------------
2022-01-01 00:00:00.0000000 +01:00 - 2022-01-01 01:00:00.0000000 +01:00 | 100      | NaN |  NaN          
2022-01-01 01:00:00.0000000 +01:00 - 2022-01-01 02:00:00.0000000 +01:00 | 90       | 300 | -210
2022-01-01 02:00:00.0000000 +01:00 - 2022-01-01 03:00:00.0000000 +01:00 | 95       | 250 | -155

Current code:

Here is my code so far:

import pandas as pd

data = [
    [1, '2022-01-01 00:00:00.0000000 +01:00', '2022-01-01 01:00:00.0000000 +01:00', 'New York', 100],
    [2, '2022-01-01 01:00:00.0000000 +01:00', '2022-01-01 02:00:00.0000000 +01:00', 'New York', 90], 
    [3, '2022-01-01 02:00:00.0000000 +01:00', '2022-01-01 03:00:00.0000000 +01:00', 'New York', 95],
    [4, '2022-01-01 01:00:00.0000000 +01:00', '2022-01-01 02:00:00.0000000 +01:00', 'Amsterdam', 300], 
    [5, '2022-01-01 02:00:00.0000000 +01:00', '2022-01-01 03:00:00.0000000 +01:00', 'Amsterdam', 250]
]

df = pd.DataFrame(data, columns=['Id', 'Start_Time', 'End_Time', 'City', 'Price'])

ny = df[df['City'] == 'New York']
amsterdam = df[df['Auction'] == 'Amsterdam']

# Here I naively try to create a new DataFrame with the price in the two cities as colums.
# I figure I can then add a new column with the difference with df_new['Difference'] = df_new['New York'] - df_new['Amsterdam']
df_new = pd.DataFrame().assign(ny=ny['Price'], amsterdam=amsterdam['Price'])

How do I go from a DataFrame that looks like the first one to one that looks like the latter?

5
  • 1
    please explain the logic in clear English and add your own effort - see minimal reproducible example Commented Jun 16, 2022 at 14:10
  • 1
    @Umar.H Completely fair. I've added my very naive code so far. Commented Jun 16, 2022 at 14:52
  • @Umar.H I hope the question is satisfactory now. Commented Jun 16, 2022 at 15:02
  • @Umar.H I would like to be able to specify the order of the columns, so that it is "Period, New York, Amsterdam, Difference". Is that what you meant? I would like to order the rows based on "Period (I realize I will have to perhaps create a "period" type for this). Commented Jun 16, 2022 at 15:31
  • @Umar.H Your answer seems to have gotten deleted for some reason. Commented Jun 16, 2022 at 16:33

1 Answer 1

1

As order is important we first need to create a categorical column based on City.

Then we can create a pivot using pd.pivot_table

df['period'] = df['Start_Time']+ " - " + df['End_time']
#assuming these are already strings, if not you'll need to cast them.
df['City'] = pd.Categorical(df['City'], 
           ordered=True, categories=['New York', 'Amsterdam'])


df1 = df.pivot_table(index='period', 
             columns='City', values='Price',aggfunc='first')

Finally, your difference column

df1['sub'] = df1.diff(axis=1).iloc[:,-1]


print(df1)

City                                                New York  Amsterdam    sub
period
2022-01-01 00:00:00.0000000 +01:00 - 2022-01-01...     100.0        NaN    NaN
2022-01-01 01:00:00.0000000 +01:00 - 2022-01-01...      90.0      300.0  210.0
2022-01-01 02:00:00.0000000 +01:00 - 2022-01-01...      95.0      250.0  155.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.