1

I would like to apply a function to a pandas DataFrame that splits some of the rows into two. So for example, I may have this as input:

df = pd.DataFrame([{'one': 3, 'two': 'a'}, {'one': 5, 'two': 'b,c'}], index=['i1', 'i2'])
    one  two
i1    3    a
i2    5  b,c

And I want something like this as output:

      one  two
i1      3    a
i2_0    5    b
i2_1    5    c

My hope was that I could just use apply() on the data frame, calling a function that returns a dataframe with 1 or more rows itself, which would then get merged back together. However, this does not seem to work at all. Here is a test case where I am just trying to duplicate each row:

dfa = df.apply(lambda s: pd.DataFrame([s.to_dict(), s.to_dict()]), axis=1)
    one  two
i1  one  two
i2  one  two

So if I return a DataFrame, the column names of that DataFrame seem to become the contents of the rows. This is obviously not what I want.

There is another question on here that was solved by using .groupby(), however I don't think this applies to my case since I don't actually want to group by anything.

What is the correct way to do this?

1 Answer 1

2

You have a messed up database (comma separated string where you should have separate columns). We first fix this:

df2 = pd.concat([df['one'], pd.DataFrame(df.two.str.split(',').tolist(), index=df.index)], axis=1)

Which gives us something more neat as

In[126]: df2
Out[126]: 
    one  0     1
i1    3  a  None
i2    5  b     c

Now, we can just do

In[125]: df2.set_index('one').unstack().dropna()
Out[125]: 
   one
0  3      a
   5      b
1  5      c

Adjusting the index (if desired) is trivial and left to the reader as an exercise.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.