2

I have one csv file with multiple Simulations delimited by a Simulation (Index) entry. Each entry has a time line and 3 feature lines. Basically the first column just had Simulation (Index) entries and nothing else while the second column has the "header" of that simulation (Time + Features 1,n) and then only numerical values.

I want to to contain this in some data frames or some sort of numpy arrays in order to plot the the graphs for each simulation and obviously to have a better grip over the data.

As someone who is fairly new to these sorts of challenges I resorted to pandas for a quick solution but I am also open to any python (numpy/other libraries) implementation.

Example of Data Format:

The lines of the features contain more than 500 samples

Thank you

2
  • it would be fine and helpful if you give a small but complete data example. Commented Mar 4, 2019 at 14:01
  • @Pyano I hope the data example is of help. Each lines contains 500+data points. Commented Mar 4, 2019 at 14:36

1 Answer 1

1

Your data example looks like Excel, so I've tryed with an Excel-sheet and have used read_excel from pandas (there is a similar command for CSV):

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df0 = pd.read_excel('testdata.xlsx',header=None)
df0.head()

gives

    0   1   2   3   4   5   6   7   8   9   10  11  12
0   sim1    time    1   2   3   4   5   6   7   8   9   10  11
1   NaN     feat1   1   0   -1  0   1   0   -1  0   1   0   -1
2   NaN     feat2   2   0   -2  0   2   0   -2  0   2   0   -2
3   NaN     feat3   3   0   -3  0   3   0   -3  0   3   0   -3
4   sim2    time    1   2   3   4   5   6   7   8   9   10  11

You can extract the data for 1 model as pandas dataframe or as numpy arrays:

def get_data_numpy(df,j):
    i = j * (nFeats+1)
    t =  np.array(df.iloc[i,2:])
    y0 = np.array(df.iloc[i+1,2:])
    y1 = np.array(df.iloc[i+2,2:])
    y2 = np.array(df.iloc[i+3,2:])
    return t,y0,y1,y2

def get_data_pandas(df,j):
    i = j * (nFeats+1)
    t =  np.array(df.iloc[i,2:])
    dfy = df.iloc[i+1:i+nFeats+1,2:]
    return t,dfy

nModels = 1                                         # run for 1 model
nFeats  = 3
for jModel in range(nModels):
    tn,y0,y1,y2 = get_data_numpy(df0,jModel)
    tp,dfy      = get_data_pandas(df0,jModel)

    #--- graphics ---
    plt.style.use('fast')  
    fig, ax0 = plt.subplots(figsize=(20,4))
    plt.plot(tp,dfy.T, lw=4, alpha=0.4);           # plot pandas dfy with 1 command

    plt.plot(tn,-y0,lw=6,ls='--')                   # plot each numpy time series
    plt.plot(tn,-y1,lw=6,ls=':') 
    plt.plot(tn,-y2,lw=6,ls='-')
    plt.show() 

fig.savefig('plot_model_1.png', transparency=True)  

gives

enter image description here

In the data display (df0.head()) and in the plot only the first model is shown. Set for nModels a larger number than 1 and you can run through all models.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you @pyano for helping me on this learning journey. The answer is well written and comprehensive :).
tx, that's fine. If it solves your problem as an accepted solution you might click the 'solved' check-box (just below the up-/down-voting buttons) which becomes green then.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.