64

Hi I have a dataframe like this:

    A             B 
0:  some value    [[L1, L2]]

I want to change it into:

    A             B 
0:  some value    L1
1:  some value    L2

How can I do that?

6 Answers 6

88

Pandas >= 0.25

df1 = pd.DataFrame({'A':['a','b'],
               'B':[[['1', '2']],[['3', '4', '5']]]})
print(df1)

    A   B
0   a   [[1, 2]]
1   b   [[3, 4, 5]]

df1 = df1.explode('B')
df1.explode('B')

    A   B
0   a   1
0   a   2
1   b   3
1   b   4
1   b   5

I don't know how good this approach is but it works when you have a list of items.

Sign up to request clarification or add additional context in comments.

5 Comments

Perfect! I recalled vaguely that there's a function to perform this in a single step but couldn't quite remember the name and wasn't able to find it again in the documentation. Almost gave in to go with the function chaining solution until I found this :)
Better than all the other provided solutions
might want check this issue before using it. (may be wait for 0.26 release) github.com/pandas-dev/pandas/issues/30748
Rule of thumb: if I can explain in few words, it should take few steps. This answer seems better than the accepted one.
and you can add .reset_index(drop=True) end of the line to remove the same index values. So; df1.explode('B').reset_index(drop=True) will be the answer
39

you can do it this way:

In [84]: df
Out[84]:
               A               B
0     some value      [[L1, L2]]
1  another value  [[L3, L4, L5]]

In [85]: (df['B'].apply(lambda x: pd.Series(x[0]))
   ....:         .stack()
   ....:         .reset_index(level=1, drop=True)
   ....:         .to_frame('B')
   ....:         .join(df[['A']], how='left')
   ....: )
Out[85]:
    B              A
0  L1     some value
0  L2     some value
1  L3  another value
1  L4  another value
1  L5  another value

UPDATE: a more generic solution

8 Comments

lambda x: pd.Series(x[0]) should be changed to lambda x: pd.Series(x) in case of flat list values in column B
@soupault, that's correct, thank you! This code works for the particular question (that was asked). Partially because of that i have posted a link to a more generic solution...
@MaxU how can I do that for two columns if they have the same number of values in the list?
@nurma_a, check this solution
Hi @MaxU--How can we do this in the opposite way? I mean wider format to long format.
|
10

Faster solution with chain.from_iterable and numpy.repeat:

from itertools import chain
import numpy as np
import pandas as pd

df = pd.DataFrame({'A':['a','b'],
                   'B':[[['A1', 'A2']],[['A1', 'A2', 'A3']]]})

print (df)
   A               B
0  a      [[A1, A2]]
1  b  [[A1, A2, A3]]


df1 = pd.DataFrame({ "A": np.repeat(df.A.values, 
                                    [len(x) for x in (chain.from_iterable(df.B))]),
                     "B": list(chain.from_iterable(chain.from_iterable(df.B)))})

print (df1)
   A   B
0  a  A1
1  a  A2
2  b  A1
3  b  A2
4  b  A3

Timings:

A = np.unique(np.random.randint(0, 1000, 1000))
B = [[list(string.ascii_letters[:random.randint(3, 10)])] for _ in range(len(A))]
df = pd.DataFrame({"A":A, "B":B})
print (df)
       A                                 B
0      0        [[a, b, c, d, e, f, g, h]]
1      1                       [[a, b, c]]
2      3     [[a, b, c, d, e, f, g, h, i]]
3      5                 [[a, b, c, d, e]]
4      6     [[a, b, c, d, e, f, g, h, i]]
5      7           [[a, b, c, d, e, f, g]]
6      8              [[a, b, c, d, e, f]]
7     10              [[a, b, c, d, e, f]]
8     11           [[a, b, c, d, e, f, g]]
9     12     [[a, b, c, d, e, f, g, h, i]]
10    13        [[a, b, c, d, e, f, g, h]]
...
...

In [67]: %timeit pd.DataFrame({ "A": np.repeat(df.A.values, [len(x) for x in (chain.from_iterable(df.B))]),"B": list(chain.from_iterable(chain.from_iterable(df.B)))})
1000 loops, best of 3: 818 µs per loop

In [68]: %timeit ((df['B'].apply(lambda x: pd.Series(x[0])).stack().reset_index(level=1, drop=True).to_frame('B').join(df[['A']], how='left')))
10 loops, best of 3: 103 ms per loop

2 Comments

This solution is 125 times faster as apply solution.
from itertools import chain
3

I can't find a elegant way to handle this, but the following codes can work...

import pandas as pd
import numpy as np
df = pd.DataFrame([{"a":1,"b":[[1,2]]},{"a":4, "b":[[3,4,5]]}])
z = []
for k,row in df.iterrows():
    for j in list(np.array(row.b).flat):
        z.append({'a':row.a, 'b':j})
result = pd.DataFrame(z)

1 Comment

this was the easiest for me to understand the working.. thank you.
1

I think this is the fastest and simplest way:

df = pd.DataFrame({'A':['a','b'],
               'B':[[['A1', 'A2']],[['A1', 'A2', 'A3']]]})


df.set_index('A')['B'].apply(lambda x: pd.Series(x[0]))

1 Comment

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.Read this
0

Here's another option

unpacked = (pd.melt(df.B.apply(pd.Series).reset_index(),id_vars='index')
 .merge(df, left_on = 'index', right_index = True))
unpacked = (unpacked.loc[unpacked.value.notnull(),:]
.drop(columns=['index','variable','B'])
.rename(columns={'value':'B'})
  1. Apply pd.series to column B --> splits each list entry to a different row
  2. Melt this, so that each entry is a separate row (preserving index)
  3. Merge this back on original dataframe
  4. Tidy up - drop unnecessary columns and rename the values column

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.