2

I have a pandas timeseries such as:

df = pd.DataFrame(
    {
        "IDs": [
            ["A", "B"],
            ["A", "B", "A", "C"],
            ["A", "B", "A", "D", "C", "D"],
        ],
        "distance": [[1], [1.1, 2.8], [1, 3, 0.5]],
    },
    index=[
        pd.to_datetime("2022-01-01 12:00:00"),
        pd.to_datetime("2022-01-01 12:00:01"),
        pd.to_datetime("2022-01-01 12:00:02"),
    ],
)

This produces:

                     IDs    distance
2022-01-01 12:00:00 [A, B]  [1]
2022-01-01 12:00:01 [A, B, A, C]    [1.1, 2.8]
2022-01-01 12:00:02 [A, B, A, D, C, D]  [1, 3, 0.5]

And I would like to split the rows so that each row only contains the distance to a single pair such as:


                     IDs    distance
2022-01-01 12:00:00 [A, B]  1.0
2022-01-01 12:00:01 [A, B]  1.1
2022-01-01 12:00:01 [A, C]  2.8
2022-01-01 12:00:02 [A, B]  1.0
2022-01-01 12:00:02 [A, D]  3.0
2022-01-01 12:00:02 [C, D]  0.5

What would be the best way to do this?

For information, each row IDs rows are always a list of 2n elements where n is the number of elements in the corresponding distance row.

0

3 Answers 3

2

IIUC, group the IDs by chunks of 2 using a list comprehension, then explode the two IDs/distance columns:

df['IDs'] = [[l[i:i+2] for i in range(0,len(l),2)] for l in df['IDs']]
df = df.explode(['IDs', 'distance'])

NB. this requires len(IDs) to be 2 times len(distance) for each row!

output:

                        IDs distance
2022-01-01 12:00:00  [A, B]        1
2022-01-01 12:00:01  [A, B]      1.1
2022-01-01 12:00:01  [A, C]      2.8
2022-01-01 12:00:02  [A, B]        1
2022-01-01 12:00:02  [A, D]        3
2022-01-01 12:00:02  [C, D]      0.5
Sign up to request clarification or add additional context in comments.

Comments

2

Similar to mozway's answer, only uses numpy reshape functionality to split up the values in the IDs lists:

df = df.assign(IDs=[np.array(l).reshape(-1, 2) for l in df['IDs'].tolist()]).explode(['IDs', 'distance'])

Output:

>>> df
                        IDs distance
2022-01-01 12:00:00  [A, B]        1
2022-01-01 12:00:01  [A, B]      1.1
2022-01-01 12:00:01  [A, C]      2.8
2022-01-01 12:00:02  [A, B]        1
2022-01-01 12:00:02  [A, D]        3
2022-01-01 12:00:02  [C, D]      0.5

Comments

2

Similar to mozway's solution but letting NumPy do the reshaping:

df.assign(IDs=df['IDs'].apply(lambda l: np.array(l).reshape(-1, 2))).explode(['IDs', 'distance'])

1 Comment

This is almost a duplicate of mine... ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.