0

I have MultiIndexed pandas Series and am trying to plot each index in its own subplot, but it is running very slowly.

To accomplish the subplotting I am using a for loop over the outer level of MultiIndex, and plotting the Series using the inner index level as the x coordinate.

def plot_series( data ):
    # create 16 subplots, corresponding to the 16 outer index levels
    fig, axs = plt.subplots( 4, 4 )

    for oi in data.index.get_level_values( 'outer_index' ):
        # calculate subplot to use
        row = int( oi/ 4 )
        col = int( oi - row* 4 )

        ax = axs[ row, col ]
        data.xs( oi ).plot( use_index = True, ax = ax )

    plt.show()

Each outer index level has 1000 data points, but the plotting takes several minutes to complete.

Is there a way to speed up the plotting?

Data

num_out = 16
num_in  = 1000

data = pd.Series( 
    data = np.random.rand( num_out* num_in ), 
    index = pd.MultiIndex.from_product( [ np.arange( num_out ), np.arange( num_in ) ], names = [ 'outer_index', 'inner_index' ] ) 
)
3
  • 1
    Hey, could you ad an example of your data so that we could actually run the code? There also seems to be an error, with a variable not being defined index = Commented Mar 7, 2019 at 11:01
  • 1
    Where does channel come from? Commented Mar 7, 2019 at 12:01
  • Sorry about the errors, and lacking data. I was trying to post this before a meeting and didn't make all the corrections in my rush. Thanks for the help :) Commented Mar 7, 2019 at 13:42

1 Answer 1

2

Rather than loop through data.index.get_level_values( 'outer_index' ), you could use data.groupby(level='outer_index') and iterate through the grouped object using:

for name, group in grouped:
   #do stuff 

This removes the bottleneck that slicing the data frame using data.xs( oi ) creates.

def plot_series(data):
   grouped = data.groupby(level='outer_index')

   fig, axs = plt.subplots( 4, 4 )
   for name, group in grouped:
      row = int( name/ 4 )
      col = int( name - row* 4 )
      ax = axs[ row, col ]
      group.plot( use_index = True, ax = ax )

      plt.show()



num_out = 16
num_in  = 1000

data = pd.Series( 
    data = np.random.rand( num_out* num_in ), 
    index = pd.MultiIndex.from_product( [ np.arange( num_out ), np.arange( num_in ) ], names = [ 'outer_index', 'inner_index' ] ) 
)

plot_series(data)

using timeit you can see this approach is much faster:

%timeit plot_series(data)
795 ms ± 252 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sign up to request clarification or add additional context in comments.

2 Comments

That worked beautifully. I'm surprised it was the slicing that was the bottleneck. What causes it to be so slow?
I think it is because groupby splits the data frame into groups using a mapper.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.