4

I have time series data in pandas dataframe with index as time at the start of measurement and columns with list of values recorded at a fixed sampling rate (difference in consecutive index/number of elements in the list)

Here is the what it looks like:

Time         A                   B                   .......  Z
0    [1, 2, 3, 4]      [1, 2, 3, 4]
2    [5, 6, 7, 8]      [5, 6, 7, 8]
4    [9, 10, 11, 12]   [9, 10, 11, 12]
6    [13, 14, 15, 16]  [13, 14, 15, 16 ] 
...

I want to expand each row in all the columns to multiple rows such that:

Time       A           B  .... Z
0          1           1
0.5        2           2
1          3           3
1.5        4           4
2          5           5 
2.5        6           6
.......

So far I am thinking along these lines (code doesn't wok):

def expand_row(dstruc):
    for i in range (len(dstruc)):
        for j in range (1,len(dstruc[i])):
            dstruc.loc[i+j/len(dstruc[i])] = dstruc[i][j]

    dstruc.loc[i] = dstruc[i][0]
    return dstruc

expanded = testdf.apply(expand_row)

I also tried using split(',') and stack() together but I am not able to fix my indexing appropriately.

0

3 Answers 3

5
import numpy as np
import pandas as pd
df = pd.DataFrame({key: zip(*[iter(range(1, 17))]*4) for key in list('ABC')},
                  index=range(0,8,2))

result = pd.DataFrame.from_items([(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)], orient='index', columns=df.columns)
result.index.name = 'Time'

grouped = result.groupby(level=0)
increment = (grouped.cumcount()/grouped.size())
result.index = result.index + increment
print(result)

yields

In [183]: result
Out[183]: 
       A   B   C
Time            
0.00   1   1   1
0.25   2   2   2
0.50   3   3   3
0.75   4   4   4
2.00   5   5   5
2.25   6   6   6
2.50   7   7   7
2.75   8   8   8
4.00   9   9   9
4.25  10  10  10
4.50  11  11  11
4.75  12  12  12
6.00  13  13  13
6.25  14  14  14
6.50  15  15  15
6.75  16  16  16

Explanation:

One way to loop over the contents of list is to use a list comprehension:

In [172]: df = pd.DataFrame({key: zip(*[iter(range(1, 17))]*4) for key in list('ABC')}, index=range(2,10,2))

In [173]: [(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)]
Out[173]: 
[(0, (1, 1, 1)),
 (0, (2, 2, 2)),
 ...
 (6, (15, 15, 15)),
 (6, (16, 16, 16))]

Once you have the values in the above form, you can build the desired DataFrame with pd.DataFrame.from_items:

result = pd.DataFrame.from_items([(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)], orient='index', columns=df.columns)
result.index.name = 'Time'

yields

In [175]: result
Out[175]: 
       A   B   C
Time            
2      1   1   1
2      2   2   2
...
8     15  15  15
8     16  16  16

To compute the increments to be added to the index, you can group by the index and find the ratio of the cumcount to the size of each group:

In [176]: grouped = result.groupby(level=0)
In [177]: increment = (grouped.cumcount()/grouped.size())
In [179]: result.index = result.index + increment
In [199]: result.index
Out[199]: 
Int64Index([ 0.0, 0.25,  0.5, 0.75,  2.0, 2.25,  2.5, 2.75,  4.0, 4.25,  4.5,
            4.75,  6.0, 6.25,  6.5, 6.75],
           dtype='float64', name=u'Time')
Sign up to request clarification or add additional context in comments.

Comments

1

Probably not ideal, but this can be done using groupby and apply a function which returns the expanded DataFrame for each row (here the time difference is assumed to be fixed at 2.0):

def expand(x):
    data = {c: x[c].iloc[0] for c in x if c != 'Time'}
    n = len(data['A'])
    step = 2.0 / n;
    data['Time'] = [x['Time'].iloc[0] + i*step for i in range(n)]
    return pd.DataFrame(data)

print df.groupby('Time').apply(expand).set_index('Time', drop=True)

Output:

       A   B
Time        
0.0    1   1
0.5    2   2
1.0    3   3
1.5    4   4
2.0    5   5
2.5    6   6
3.0    7   7
3.5    8   8
4.0    9   9
4.5   10  10
5.0   11  11
5.5   12  12
6.0   13  13
6.5   14  14
7.0   15  15
7.5   16  16

Comments

0

Say, the dataframe wanted to be expanded is named as df_to_expand, you could do the following using eval.

df_expanded_list = []
for coln in df_to_expand.columns:
    _df = df_to_expand[coln].apply(lambda x: pd.Series(eval(x), index=[coln + '_' + str(i) for i in range(len(eval(x)))]))
    df_expanded_list.append(_df)

df_expanded = pd.concat(df_expanded_list, axis=1)

References: covert a string which is a list into a proper list python

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.