1

I am trying to write the following condition to map MARKET_STATES.OPENING_AUCTION, however I am getting KeyError: (False, False, False) when I run the following:

market_info_df.loc[market_info_df['issue_status' == '10', 'market_phase' == '0',
                                          'trading_state' == '  '].iloc[0]] = MARKET_STATES.OPENING_AUCTION

Two questions:

  • What is it that the query does? My understanding is that using iloc[0] it finds the first instance where the three columns at the values stated. But then it maps something to MARKET_STATES.OPENING_AUCTION or?
  • How can I write a condition like the above where if the 3 columns match the values, then for the first occurrence of this, map another columns (say trade_type) to MARKET_STATES.OPENING_AUCTION?

Here is an example:

e.g:

row  issue_status  market_phase  trading_state                  trade_type
 0        20           0              '  '         ->        
 1        10           0              '  '         ->        OPENING_AUCTION
 2        20           0              '  '         ->       
 3        10           0              '  '         ->       
 4        10           0              '  '         ->       
 5        10           0              '  '         ->       
2
  • 2
    Since you say your exact dataframe isn't relevant and too big for the question, you should create a toy dataframe as part of a minimal reproducible example. Commented Dec 6, 2022 at 20:05
  • Thanks Pranav- I found a way to add a simple example Commented Dec 6, 2022 at 20:11

1 Answer 1

0

I'll answer your second question first:

How can I write a condition like the above where if the 3 columns match the values, then for the first occurrence of this, map another columns (say trade_type) to MARKET_STATES.OPENING_AUCTION?

market_info_df = pd.DataFrame({'issue_status': {0: 20, 1: 10, 2: 20, 3: 10, 4: 10, 5: 10},
 'market_phase': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0},
 'trading_state': {0: '  ', 1: '  ', 2: '  ', 3: '  ', 4: '  ', 5: '  '}})

Any time you want to combine multiple conditions, remember that you need a logical and or logical or operation on those conditions.

When you have three conditions and you want to select rows where all of them are True, you need to and all the conditions. So in your case you'd need to do

mask = (market_info_df['issue_status'] == 10) & (market_info_df['market_phase'] == 0) & (market_info_df['trading_state'] == '  ')

To get a boolean mask that tells you which rows fulfill all those conditions. The & operator is overloaded by pandas to do the element-wise logical and for multiple Series objects. The parentheses are needed to override python's operator precedence which puts & before ==

Note: My dataframe contains integers in the issue_status and market_phase columns, which is why I compare against integers e.g. market_info_df['issue_status'] == 10. If your columns contain strings (as your code would indicate), compare against strings (market_info_df['issue_status'] == "10")

Next, you want to set those rows for the "trade_type" column, so you'd do:

market_info_df.loc[mask, "trade_type"] = "OPENING_AUCTION"

Which sets those only rows:

   issue_status  market_phase trading_state       trade_type
0            20             0                            NaN
1            10             0                OPENING_AUCTION
2            20             0                            NaN
3            10             0                OPENING_AUCTION
4            10             0                OPENING_AUCTION
5            10             0                OPENING_AUCTION

Now, we can use idxmax() to get the index of the max value of the mask. Since this mask only contains True and False values, the first occurrence of the True value is the one that is returned. If we do this instead of setting all values of mask, we get:

market_info_df.loc[mask.idxmax(), "trade_type"] = "OPENING_AUCTION"

which gives:

   issue_status  market_phase trading_state       trade_type
0            20             0                            NaN
1            10             0                OPENING_AUCTION
2            20             0                            NaN
3            10             0                            NaN
4            10             0                            NaN
5            10             0                            NaN

As for your first question:

What is it that the query does? My understanding is that using iloc[0] it finds the first instance where the three columns at the values stated. But then it maps something to MARKET_STATES.OPENING_AUCTION or?

I have no idea what that code is supposed to do. The code inside the brackets i.e.:

'issue_status' == '10', 'market_phase' == '0', 'trading_state' == '  '

compares the string "issue_status" to the string "10" (which unsurprisingly is False) and so on, and returns a tuple that contains three Falses.

Then, market_info_df[False, False, False] tries to find the element in the dataframe at the False, False, False key, and throws the error when it fails to do so.

Sign up to request clarification or add additional context in comments.

11 Comments

Amazing, thank you @Pranav ; what if I want to map the first occurance to OPENING_AUCTION as you have done above, but all subsequent occurances to CONTINUOUS_AUCTION?
@Patrick I just realized you wanted to change only the first occurrence. I've updated my answer.
And is it easy to map the rest of the instances to CONTINUOUS_AUCTION?
@PatrickChong yes! I'd suggest you set the entire column to market_info_df["trade_type"] = "CONTINUOUS_ACTION" first, and then overwrite the rows that you need with "OPENING_AUCTION" the way I showed above.
That's so clever, thank you
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.