2

I need to apply a custom transformation to a dataframe like this:

import pandas as pd

df = pd.DataFrame({
    'value': ['a'],
    'measure':[['b', 'c']]
})

transformed_df = pd.DataFrame({
    'measure': ['b', 'c'],
    'value': ['a', 'a']
})

What's an efficient way of getting from df to transformed_df?

2
  • What is the logic of the transformation? Commented Jul 16, 2020 at 19:04
  • It defines a relationship between value and measure, I need to get the opposite of what we have in the data frame. Commented Jul 16, 2020 at 19:11

2 Answers 2

3

Try, pd.DataFrame.explode:

df.explode('measure').reset_index(drop=True)

Output:

  value measure
0     a       b
1     a       c
Sign up to request clarification or add additional context in comments.

Comments

1

One approach to the problem would be to think of it as constructing a MultiIndex:

value =  ['a']
measure = ['b','c']
idx = pd.MultiIndex.from_product([value,measure], names = ['value','measure'])

df = pd.DataFrame(index=idx).reset_index()

where df is:

  value measure
0     a       b
1     a       c

Having never seen the explode method before, I was curious to do some timing tests:

def test_multi(value, measure):
    idx = pd.MultiIndex.from_product([value,measure], names = ['value','measure'])

    df = pd.DataFrame(index=idx).reset_index()
    
    return df

def test_explode(df):
    return df.explode('measure').reset_index(drop=True)


value =  ['a']*10000
measure = ['b','c']*10000

%timeit test_multi(value, measure)
#13 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

value =  ['a']*10000
measure = [['b','c']]*10000


df = pd.DataFrame({
    'value': value,
    'measure':measure
})

%timeit test_explode(df)
#16.9 ms ± 199 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.