2

I want to delete specific rows based in conditions on rows from a Pandas dataframe.

For example, since I have several currency pairs at the same time, I intend to select only one of the currencies of the same time.

This is the priority: EUR, USD, GBP, CHF.

currency    timebuy buyprice
CNHUSD  2021-01-05 08:30:00 0,00005073
CNHGBP  2021-01-05 08:30:00 1,588
ZARGBP  2021-01-07 05:15:00 0,2727
ZARUSD  2021-01-07 05:15:00 300
ZAREUR  2021-01-07 13:00:00 0,1936
ZARCHF  2021-01-07 13:00:00 0,0000052
JPYCHF  2021-01-13 06:00:00 0,0002222
JPYUSD  2021-01-13 06:00:00 8
JPYGBP  2021-01-13 06:00:00 8

enter image description here

to

currency    timebuy buyprice
CNHUSD  2021-01-05 08:30:00 0,00005073
ZAREUR  2021-01-07 13:00:00 0,1936
JPYUSD  2021-01-13 06:00:00 8

enter image description here

7
  • Hi again, Fábio :) Will you please post your tables as text in the question? I can't copy/paste from images. Commented Feb 6, 2022 at 1:17
  • 3
    Sorry there was a mistake, now its correct! thanks Richardec Commented Feb 6, 2022 at 1:29
  • 1
    So what's the logic here? Commented Feb 6, 2022 at 2:01
  • 1
    i have several currency pairs at the same timestamp. I just want one of them in each timestamp. It it as *EUR is my first choice, if not i prefer *USD, GDP, CHF... Commented Feb 6, 2022 at 2:15
  • Should the output also include a row for ZARUSD, 2021-01-07 05:15:00, 300? Commented Feb 6, 2022 at 2:26

2 Answers 2

1

Using groupby and reindex:

# Hard-code your priority for the second currency in each pair
pri = ['EUR', 'USD', 'GBP', 'CHF']

# Create a new column for the second currency of each pair
df['2ndcurr'] = df['currency'].str[-3:]


# Group by time and second currency,
# Sort inner level (1) of resulting MultiIndex to match priority,
# Group by the outer level (0),
# Get the first row of each group,
# Reset timebuy from index into its own column

(df.set_index(['timebuy', '2ndcurr'])
   .reindex(pri, level=1)
   .groupby(level=0)
   .first()
   .reset_index())

               timebuy currency    buyprice
0  2021-01-05 08:30:00   CNHUSD  0,00005073
1  2021-01-07 05:15:00   ZARUSD         300
2  2021-01-07 13:00:00   ZAREUR      0,1936
3  2021-01-13 06:00:00   JPYUSD           8
Sign up to request clarification or add additional context in comments.

Comments

1

For a priority list like this, it's easiest to work with numbers. So, you can create a nice numeric mapping from your priority list, and use it to pick rows:

priority = ['EUR', 'USD', 'GBP', 'CHF']
mapping = {p: i for i, p in enumerate(priority)}
indexes = df['currency'].str[-3:].map(mapping).groupby(df['currency'].str[:3]).idxmin().sort_values()
selected = df.loc[indexes]

Output:

>>> selected
  currency             timebuy    buyprice
0   CNHUSD 2021-01-05 08:30:00  0,00005073
4   ZAREUR 2021-01-07 13:00:00      0,1936
7   JPYUSD 2021-01-13 06:00:00           8

One-liner:

priority = ['EUR', 'USD', 'GBP', 'CHF']
filtered = df.loc[df['currency'].str[-3:].map({p: i for i, p in enumerate(priority)}).groupby(df['currency'].str[:3]).idxmin().sort_values()]

If you want to group by each timestamp instead of the first 3 letters of currency, group by df['timestamp'] instead of df['currency'].str[:3], i.e.:

indexes = df['currency'].str[-3:].map(mapping).groupby(df['timestamp']).idxmin().sort_values()
                                                     # ^^^^^^^^^^^^^^^   

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.