1

I have a numpy structured array that looks like this:

  idx lvl start   end
   60  71  10.0   0.0
   60  72   0.0  25.0
   60  73   0.0  35.0
   61  73   5.0   0.0
   65  71   5.0   0.0
   67  72   5.0   0.0
   67  74   0.0  10.0
   ...

I want to make a new array with this under some conditions.

1) Rows that have at least one start value and one end value are used (idx 60 and 67 rows are used in this example).

2) If there are multiple start and end values, only the biggest end value's level and the smallest start value's level for the level are used(idx 60 will have 71 and 73).

The result will look like this:

idx start_lvl end_lvl
 60        71      73
 67        72      74

I don't mind using pandas, but I'd like to avoid making addtional arrays or using loops. Are there any simple ways to do this?

3
  • 1
    Is expected output correct? I think If there are multiple start and end values, only the biggest end value's level and the *smallest start value's* level for the level are used Commented Aug 1, 2019 at 7:40
  • 1
    @jezrael Oh I'm sorry! I was confused and what I meant is that the biggest level value and smallest level value themselves, however, I referred to your answer and have found the solution. Thank you for your help! Commented Aug 1, 2019 at 8:45
  • So IIUC, the accepted answer does not represent the correct solution but did only help you to find some way how to get what you needed...? And this correct solution connot be found here? Commented Aug 1, 2019 at 8:53

1 Answer 1

3

First filter by Series.duplicated only rows with dupes in idx column, then create index by lvl column, so possible use DataFrameGroupBy.idxmax - get index values by maximum of columns:

 #create DataFrame from structured array, thanks @SpghttCd 
df = pd.DataFrame(struct_arr)

df = df[df['idx'].duplicated(keep=False)].set_index('lvl').groupby('idx').idxmax()
print (df)
     start  end
idx            
60      71   73
67      72   74

By description need idxmin for start - it return first minimum:

df2 = (df[df['idx'].duplicated(keep=False)]
           .set_index('lvl')
           .groupby('idx')
           .agg({'start':'idxmin', 'end':'idxmax'}))
print (df2)
     start  end
idx            
60      72   73
67      74   74
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.