Load multiple dataframes from one csv using panda or numpy

Question

I have one csv file with multiple Simulations delimited by a Simulation (Index) entry. Each entry has a time line and 3 feature lines. Basically the first column just had Simulation (Index) entries and nothing else while the second column has the "header" of that simulation (Time + Features 1,n) and then only numerical values.

I want to to contain this in some data frames or some sort of numpy arrays in order to plot the the graphs for each simulation and obviously to have a better grip over the data.

As someone who is fairly new to these sorts of challenges I resorted to pandas for a quick solution but I am also open to any python (numpy/other libraries) implementation.

Example of Data Format:

Thank you

it would be fine and helpful if you give a small but complete data example. — pyano
– pyano, Commented Mar 4, 2019 at 14:01
@Pyano I hope the data example is of help. Each lines contains 500+data points. — Sys.Overdrive
– Sys.Overdrive, Commented Mar 4, 2019 at 14:36

pyano · Accepted Answer · 2019-03-05 07:58:41Z

1

Your data example looks like Excel, so I've tryed with an Excel-sheet and have used read_excel from pandas (there is a similar command for CSV):

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df0 = pd.read_excel('testdata.xlsx',header=None)
df0.head()

gives

    0   1   2   3   4   5   6   7   8   9   10  11  12
0   sim1    time    1   2   3   4   5   6   7   8   9   10  11
1   NaN     feat1   1   0   -1  0   1   0   -1  0   1   0   -1
2   NaN     feat2   2   0   -2  0   2   0   -2  0   2   0   -2
3   NaN     feat3   3   0   -3  0   3   0   -3  0   3   0   -3
4   sim2    time    1   2   3   4   5   6   7   8   9   10  11

You can extract the data for 1 model as pandas dataframe or as numpy arrays:

def get_data_numpy(df,j):
    i = j * (nFeats+1)
    t =  np.array(df.iloc[i,2:])
    y0 = np.array(df.iloc[i+1,2:])
    y1 = np.array(df.iloc[i+2,2:])
    y2 = np.array(df.iloc[i+3,2:])
    return t,y0,y1,y2

def get_data_pandas(df,j):
    i = j * (nFeats+1)
    t =  np.array(df.iloc[i,2:])
    dfy = df.iloc[i+1:i+nFeats+1,2:]
    return t,dfy

nModels = 1                                         # run for 1 model
nFeats  = 3
for jModel in range(nModels):
    tn,y0,y1,y2 = get_data_numpy(df0,jModel)
    tp,dfy      = get_data_pandas(df0,jModel)

    #--- graphics ---
    plt.style.use('fast')  
    fig, ax0 = plt.subplots(figsize=(20,4))
    plt.plot(tp,dfy.T, lw=4, alpha=0.4);           # plot pandas dfy with 1 command

    plt.plot(tn,-y0,lw=6,ls='--')                   # plot each numpy time series
    plt.plot(tn,-y1,lw=6,ls=':') 
    plt.plot(tn,-y2,lw=6,ls='-')
    plt.show() 

fig.savefig('plot_model_1.png', transparency=True)

gives

In the data display (df0.head()) and in the plot only the first model is shown. Set for nModels a larger number than 1 and you can run through all models.

answered Mar 5, 2019 at 7:58

pyano

1,99813 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sys.Overdrive Over a year ago

Thank you @pyano for helping me on this learning journey. The answer is well written and comprehensive :).

pyano Over a year ago

tx, that's fine. If it solves your problem as an accepted solution you might click the 'solved' check-box (just below the up-/down-voting buttons) which becomes green then.

Collectives™ on Stack Overflow

Load multiple dataframes from one csv using panda or numpy

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related