0

enter image description here

What is the most efficient way to convert a pandas dataframe with each individual rows like this:

    p1  p2  prog
0   A   B   C

into 3 lines like this?

    n1  n2  edge_type
0   A   A/B marriage
1   B   A/B marriage
2   A/B C   child

or equivalently, converting df to DF as follows:

df = pd.DataFrame({'prog':['C'], 'p1': ['A'], 'p2': ['B']})
dF = pd.DataFrame({'edge_type':['marriage', 'marriage', 'child'], 'n1': ['A', 'B', 'A/B'], 'n2': ['A/B', 'A/B', 'C']})

It is straightforward with defining a worker function and use mapply in R, but I am still scratching my head on doing that in Python.

2 Answers 2

2
df = pd.DataFrame({'prog':['C'], 'p1': ['A'], 'p2': ['B']})

data = []
for row in df.itertuples():
    for i in range(1,4):
        if i in (1,2):
            data.append(('marriage', row[i], '/'.join([row[1], row[2]])))
        else:
            data.append(('child', '/'.join([row[1], row[2]]), row[i]))
dF = pd.DataFrame.from_records(data, columns=('edge_type', 'n1', 'n2'))

I tried apply function but ended up with a very hackish solution. I am sure there are better solutions.

Sign up to request clarification or add additional context in comments.

3 Comments

I tried this over the example above. This does not return the proper mix.
@Rotail What error/ output are you getting? Works just fine for me.
No error. It just doesn't return what is supposed to return in above example.
1

Using apply:

def func(s):
    combo = '/'.join([s['p1'], s['p2']])
    l = [[s['p1'], combo, 'marriage'], [s['p2'], combo, 'marriage'], [combo, s['prog'], 'child']]
    return pd.DataFrame(l, columns=['n1', 'n2', 'edge_type']).unstack()

Then with your example:

df.apply(func, axis=1).stack().reset_index(drop=True)

returns

    n1   n2 edge_type
0    A  A/B  marriage
1    B  A/B  marriage
2  A/B    C     child

4 Comments

Do you have any idea why it gives --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-13-0752ed438839> in <module>() ----> 1 combo = '/'.join([G_df['p1'], G_df['p2']]) TypeError: sequence item 0: expected str instance, Series found when I'm trying to build combo outside the function?
Although the function works just fine for this example, I get the same type of error above when I'm trying to apply this function on other dataframes. FYI. TypeError: ('sequence item 0: expected str instance, numpy.float64 found', 'occurred at index 0')
@Rotail It's hard to debug in the comments... Try asking another question with a Minimal, Complete, Verifiable Example and I'll take a look!
sure, will do. But before that, I just noticed your solution need a little bit of modification for this question. It returns A and B in the first and second rows. However, it is supposed to return, A and A/B in the first row and B and A/B in the second row. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.