Pandas Dataframe: Expand rows with lists to multiple row with desired indexing for all columns

Question

I have time series data in pandas dataframe with index as time at the start of measurement and columns with list of values recorded at a fixed sampling rate (difference in consecutive index/number of elements in the list)

Here is the what it looks like:

Time         A                   B                   .......  Z
0    [1, 2, 3, 4]      [1, 2, 3, 4]
2    [5, 6, 7, 8]      [5, 6, 7, 8]
4    [9, 10, 11, 12]   [9, 10, 11, 12]
6    [13, 14, 15, 16]  [13, 14, 15, 16 ] 
...

I want to expand each row in all the columns to multiple rows such that:

Time       A           B  .... Z
0          1           1
0.5        2           2
1          3           3
1.5        4           4
2          5           5 
2.5        6           6
.......

So far I am thinking along these lines (code doesn't wok):

def expand_row(dstruc):
    for i in range (len(dstruc)):
        for j in range (1,len(dstruc[i])):
            dstruc.loc[i+j/len(dstruc[i])] = dstruc[i][j]

    dstruc.loc[i] = dstruc[i][0]
    return dstruc

expanded = testdf.apply(expand_row)

I also tried using split(',') and stack() together but I am not able to fix my indexing appropriately.

unutbu · Accepted Answer · 2015-11-19 02:38:27Z

import numpy as np
import pandas as pd
df = pd.DataFrame({key: zip(*[iter(range(1, 17))]*4) for key in list('ABC')},
                  index=range(0,8,2))

result = pd.DataFrame.from_items([(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)], orient='index', columns=df.columns)
result.index.name = 'Time'

grouped = result.groupby(level=0)
increment = (grouped.cumcount()/grouped.size())
result.index = result.index + increment
print(result)

yields

In [183]: result
Out[183]: 
       A   B   C
Time            
0.00   1   1   1
0.25   2   2   2
0.50   3   3   3
0.75   4   4   4
2.00   5   5   5
2.25   6   6   6
2.50   7   7   7
2.75   8   8   8
4.00   9   9   9
4.25  10  10  10
4.50  11  11  11
4.75  12  12  12
6.00  13  13  13
6.25  14  14  14
6.50  15  15  15
6.75  16  16  16

Explanation:

One way to loop over the contents of list is to use a list comprehension:

In [172]: df = pd.DataFrame({key: zip(*[iter(range(1, 17))]*4) for key in list('ABC')}, index=range(2,10,2))

In [173]: [(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)]
Out[173]: 
[(0, (1, 1, 1)),
 (0, (2, 2, 2)),
 ...
 (6, (15, 15, 15)),
 (6, (16, 16, 16))]

Once you have the values in the above form, you can build the desired DataFrame with pd.DataFrame.from_items:

result = pd.DataFrame.from_items([(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)], orient='index', columns=df.columns)
result.index.name = 'Time'

yields

In [175]: result
Out[175]: 
       A   B   C
Time            
2      1   1   1
2      2   2   2
...
8     15  15  15
8     16  16  16

To compute the increments to be added to the index, you can group by the index and find the ratio of the cumcount to the size of each group:

In [176]: grouped = result.groupby(level=0)
In [177]: increment = (grouped.cumcount()/grouped.size())
In [179]: result.index = result.index + increment
In [199]: result.index
Out[199]: 
Int64Index([ 0.0, 0.25,  0.5, 0.75,  2.0, 2.25,  2.5, 2.75,  4.0, 4.25,  4.5,
            4.75,  6.0, 6.25,  6.5, 6.75],
           dtype='float64', name=u'Time')

YS-L · Accepted Answer · 2015-11-19 02:38:08Z

1

Probably not ideal, but this can be done using groupby and apply a function which returns the expanded DataFrame for each row (here the time difference is assumed to be fixed at 2.0):

def expand(x):
    data = {c: x[c].iloc[0] for c in x if c != 'Time'}
    n = len(data['A'])
    step = 2.0 / n;
    data['Time'] = [x['Time'].iloc[0] + i*step for i in range(n)]
    return pd.DataFrame(data)

print df.groupby('Time').apply(expand).set_index('Time', drop=True)

Output:

       A   B
Time        
0.0    1   1
0.5    2   2
1.0    3   3
1.5    4   4
2.0    5   5
2.5    6   6
3.0    7   7
3.5    8   8
4.0    9   9
4.5   10  10
5.0   11  11
5.5   12  12
6.0   13  13
6.5   14  14
7.0   15  15
7.5   16  16

answered Nov 19, 2015 at 2:38

YS-L

14.8k4 gold badges52 silver badges62 bronze badges

Comments

Gabriel · Accepted Answer · 2019-11-18 02:51:02Z

0

Say, the dataframe wanted to be expanded is named as df_to_expand, you could do the following using eval.

df_expanded_list = []
for coln in df_to_expand.columns:
    _df = df_to_expand[coln].apply(lambda x: pd.Series(eval(x), index=[coln + '_' + str(i) for i in range(len(eval(x)))]))
    df_expanded_list.append(_df)

df_expanded = pd.concat(df_expanded_list, axis=1)

References: covert a string which is a list into a proper list python

answered Nov 18, 2019 at 2:51

Gabriel

1602 gold badges2 silver badges12 bronze badges

Collectives™ on Stack Overflow

Pandas Dataframe: Expand rows with lists to multiple row with desired indexing for all columns

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related