Dataframe manipulation in python Pandas DataFrame

Question

What is the most efficient way to convert a pandas dataframe with each individual rows like this:

    p1  p2  prog
0   A   B   C

into 3 lines like this?

    n1  n2  edge_type
0   A   A/B marriage
1   B   A/B marriage
2   A/B C   child

or equivalently, converting df to DF as follows:

df = pd.DataFrame({'prog':['C'], 'p1': ['A'], 'p2': ['B']})
dF = pd.DataFrame({'edge_type':['marriage', 'marriage', 'child'], 'n1': ['A', 'B', 'A/B'], 'n2': ['A/B', 'A/B', 'C']})

It is straightforward with defining a worker function and use mapply in R, but I am still scratching my head on doing that in Python.

Floydian · Accepted Answer · 2018-03-05 22:14:43Z

2

df = pd.DataFrame({'prog':['C'], 'p1': ['A'], 'p2': ['B']})

data = []
for row in df.itertuples():
    for i in range(1,4):
        if i in (1,2):
            data.append(('marriage', row[i], '/'.join([row[1], row[2]])))
        else:
            data.append(('child', '/'.join([row[1], row[2]]), row[i]))
dF = pd.DataFrame.from_records(data, columns=('edge_type', 'n1', 'n2'))

I tried apply function but ended up with a very hackish solution. I am sure there are better solutions.

edited Mar 5, 2018 at 22:14

answered Mar 2, 2018 at 22:54

Floydian

4001 silver badge14 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Rotail Over a year ago

I tried this over the example above. This does not return the proper mix.

Floydian Over a year ago

@Rotail What error/ output are you getting? Works just fine for me.

Rotail Over a year ago

No error. It just doesn't return what is supposed to return in above example.

Alex · Accepted Answer · 2018-03-05 19:24:07Z

1

Using apply:

def func(s):
    combo = '/'.join([s['p1'], s['p2']])
    l = [[s['p1'], combo, 'marriage'], [s['p2'], combo, 'marriage'], [combo, s['prog'], 'child']]
    return pd.DataFrame(l, columns=['n1', 'n2', 'edge_type']).unstack()

Then with your example:

df.apply(func, axis=1).stack().reset_index(drop=True)

returns

    n1   n2 edge_type
0    A  A/B  marriage
1    B  A/B  marriage
2  A/B    C     child

edited Mar 5, 2018 at 19:24

answered Mar 3, 2018 at 0:46

Alex

19.2k9 gold badges65 silver badges82 bronze badges

4 Comments

Rotail Over a year ago

Do you have any idea why it gives

--------------------------------------------------------------------------- TypeError                                 Traceback (most recent call last) <ipython-input-13-0752ed438839> in <module>() ----> 1 combo = '/'.join([G_df['p1'], G_df['p2']])  TypeError: sequence item 0: expected str instance, Series found

when I'm trying to build combo outside the function?

Rotail Over a year ago

Although the function works just fine for this example, I get the same type of error above when I'm trying to apply this function on other dataframes. FYI. TypeError: ('sequence item 0: expected str instance, numpy.float64 found', 'occurred at index 0')

Alex Over a year ago

@Rotail It's hard to debug in the comments... Try asking another question with a Minimal, Complete, Verifiable Example and I'll take a look!

Rotail Over a year ago

sure, will do. But before that, I just noticed your solution need a little bit of modification for this question. It returns A and B in the first and second rows. However, it is supposed to return, A and A/B in the first row and B and A/B in the second row. Thanks

Collectives™ on Stack Overflow

Dataframe manipulation in python Pandas DataFrame

2 Answers 2

3 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related