How to make a numpy array with some conditions?

Question

I have a numpy structured array that looks like this:

  idx lvl start   end
   60  71  10.0   0.0
   60  72   0.0  25.0
   60  73   0.0  35.0
   61  73   5.0   0.0
   65  71   5.0   0.0
   67  72   5.0   0.0
   67  74   0.0  10.0
   ...

I want to make a new array with this under some conditions.

1) Rows that have at least one start value and one end value are used (idx 60 and 67 rows are used in this example).

2) If there are multiple start and end values, only the biggest end value's level and the smallest start value's level for the level are used(idx 60 will have 71 and 73).

The result will look like this:

idx start_lvl end_lvl
 60        71      73
 67        72      74

I don't mind using pandas, but I'd like to avoid making addtional arrays or using loops. Are there any simple ways to do this?

Is expected output correct? I think If there are multiple start and end values, only the biggest end value's level and the *smallest start value's* level for the level are used — jezrael
– jezrael, Commented Aug 1, 2019 at 7:40
@jezrael Oh I'm sorry! I was confused and what I meant is that the biggest level value and smallest level value themselves, however, I referred to your answer and have found the solution. Thank you for your help! — maynull
– maynull, Commented Aug 1, 2019 at 8:45
So IIUC, the accepted answer does not represent the correct solution but did only help you to find some way how to get what you needed...? And this correct solution connot be found here? — SpghttCd
– SpghttCd, Commented Aug 1, 2019 at 8:53

jezrael · Accepted Answer · 2019-08-01 08:10:09Z

3

First filter by Series.duplicated only rows with dupes in idx column, then create index by lvl column, so possible use DataFrameGroupBy.idxmax - get index values by maximum of columns:

 #create DataFrame from structured array, thanks @SpghttCd 
df = pd.DataFrame(struct_arr)

df = df[df['idx'].duplicated(keep=False)].set_index('lvl').groupby('idx').idxmax()
print (df)
     start  end
idx            
60      71   73
67      72   74

By description need idxmin for start - it return first minimum:

df2 = (df[df['idx'].duplicated(keep=False)]
           .set_index('lvl')
           .groupby('idx')
           .agg({'start':'idxmin', 'end':'idxmax'}))
print (df2)
     start  end
idx            
60      72   73
67      74   74

edited Aug 1, 2019 at 8:10

answered Aug 1, 2019 at 7:29

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to make a numpy array with some conditions?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related