Split pandas dataframe rows into multiple rows

Question

I have a pandas timeseries such as:

df = pd.DataFrame(
    {
        "IDs": [
            ["A", "B"],
            ["A", "B", "A", "C"],
            ["A", "B", "A", "D", "C", "D"],
        ],
        "distance": [[1], [1.1, 2.8], [1, 3, 0.5]],
    },
    index=[
        pd.to_datetime("2022-01-01 12:00:00"),
        pd.to_datetime("2022-01-01 12:00:01"),
        pd.to_datetime("2022-01-01 12:00:02"),
    ],
)

This produces:

                     IDs    distance
2022-01-01 12:00:00 [A, B]  [1]
2022-01-01 12:00:01 [A, B, A, C]    [1.1, 2.8]
2022-01-01 12:00:02 [A, B, A, D, C, D]  [1, 3, 0.5]

And I would like to split the rows so that each row only contains the distance to a single pair such as:


                     IDs    distance
2022-01-01 12:00:00 [A, B]  1.0
2022-01-01 12:00:01 [A, B]  1.1
2022-01-01 12:00:01 [A, C]  2.8
2022-01-01 12:00:02 [A, B]  1.0
2022-01-01 12:00:02 [A, D]  3.0
2022-01-01 12:00:02 [C, D]  0.5

What would be the best way to do this?

For information, each row IDs rows are always a list of 2n elements where n is the number of elements in the corresponding distance row.

mozway · Accepted Answer · 2022-03-17 14:18:04Z

2

IIUC, group the IDs by chunks of 2 using a list comprehension, then explode the two IDs/distance columns:

df['IDs'] = [[l[i:i+2] for i in range(0,len(l),2)] for l in df['IDs']]
df = df.explode(['IDs', 'distance'])

NB. this requires len(IDs) to be 2 times len(distance) for each row!

output:

                        IDs distance
2022-01-01 12:00:00  [A, B]        1
2022-01-01 12:00:01  [A, B]      1.1
2022-01-01 12:00:01  [A, C]      2.8
2022-01-01 12:00:02  [A, B]        1
2022-01-01 12:00:02  [A, D]        3
2022-01-01 12:00:02  [C, D]      0.5

answered Mar 17, 2022 at 14:18

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user17242583 · Accepted Answer · 2022-03-17 14:30:31Z

2

Similar to mozway's answer, only uses numpy reshape functionality to split up the values in the IDs lists:

df = df.assign(IDs=[np.array(l).reshape(-1, 2) for l in df['IDs'].tolist()]).explode(['IDs', 'distance'])

Output:

>>> df
                        IDs distance
2022-01-01 12:00:00  [A, B]        1
2022-01-01 12:00:01  [A, B]      1.1
2022-01-01 12:00:01  [A, C]      2.8
2022-01-01 12:00:02  [A, B]        1
2022-01-01 12:00:02  [A, D]        3
2022-01-01 12:00:02  [C, D]      0.5

answered Mar 17, 2022 at 14:30

user17242583

Comments

timgeb · Accepted Answer · 2022-03-17 14:36:01Z

2

Similar to mozway's solution but letting NumPy do the reshaping:

df.assign(IDs=df['IDs'].apply(lambda l: np.array(l).reshape(-1, 2))).explode(['IDs', 'distance'])

answered Mar 17, 2022 at 14:36

timgeb

79.2k20 gold badges129 silver badges150 bronze badges

1 Comment

user17242583 Over a year ago

This is almost a duplicate of mine... ;)

Collectives™ on Stack Overflow

Split pandas dataframe rows into multiple rows

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related