2

I have a table like so:

Name   | ID | Contact_method | Contact
sarah    1   house            h1
sarah    1   mobile           m1
sarah    1   email            sarah@mail
bob      2   house            h2
bob      2   mobile           m2
bob      2   email            bob@mail
jones    3   house            h3
jones    3   mobile           m3
jones    3   email            jones@mail
jones    4   house            h4
jones    4   mobile           m4
jones    4   email            jones2@mail

And I want it like so:

Name  | ID | house | mobile | email
sarah   1    h1      m1       sarah@mail
bob     2    h2      m2       bob@mail
jones   3    h3      m3       jones@mail
jones   4    h4      m4       jones2@mail

I can already do this, but only through a very expensive pd.concat operation iterated over all the unique IDs. Is there a simple way to do this? I've also tinkered with pivot() and transpose(). Note that the duplicate name is there so that I can't rely on uniqueness of column values to, say, do a join.

3 Answers 3

2

Set the index with all columns except 'Contact_method', then unstack

df.set_index(
    ['Name', 'ID', 'Contact_method']
)['Contact'].unstack().rename_axis(None, 1).reset_index()

    Name  ID        email house mobile
0    bob   2     bob@mail    h2     m2
1  jones   3   jones@mail    h3     m3
2  jones   4  jones2@mail    h4     m4
3  sarah   1   sarah@mail    h1     m1
Sign up to request clarification or add additional context in comments.

4 Comments

I have a desk to sit at in my new temporary place while we continue to look for houses. I am set up to work remotely for the time being. I'll be commuting back to Seattle twice a month. And soon I have to go back to get rid of all my stuff at old place. Still busy, but I've enjoyed having some time to answer questions. I hope your doing well! @jezrael
@jezrael yeah, I made a big push for legendary then I felt I could calm down a bit. You're almost top pandas all time. My next SO goal will be to get past DSM and Jeff on that list. I've never been very motivated by rep itself. I've given a lot away. I'll get 100k eventually.. I do want a t-shirt. You'll have to let me know if they give you anything.
I'd want to participate in stats.stackexchange.com and quant.stackexchange.com. However, I'd rather pick some other tags to get gold in. Like, I want my numpy badge, I've neglected my machine learning stuff. I want to get a gold badge in tensorflow (though I still have a lot to learn)
you want to start participating in meta? I think you'd be a good mod! You think about running for a position? @jezrael
0

One way is to build a contact dictionary (of dictionaries) based on ID, 'manualy'. Not sure if it is more efficient though.

people = dict()
for index, row in pd.iterrows():
    ID = row['ID']
    if ID not in people:
        people[ID] = {'ID': ID, 'Name': row['Name']}
    people[ID][row['Contact_method']] = row['Contact']

print pandas.DataFrame(people).transpose()

And output is:

  ID   Name        email house mobile
1  1  sarah   sarah@mail    h1     m1
2  2    bob     bob@mail    h2     m2
3  3  jones   jones@mail    h3     m3
4  4  jones  jones2@mail    h4     m4

Comments

0

I think piRSquared's solution is very nice, but if get:

ValueError: Index contains duplicate entries, cannot reshape

print (df)
     Name  ID Contact_method      Contact
0   sarah   1          house           h1
1   sarah   1         mobile           m1
2   sarah   1          email   sarah@mail
3     bob   2          house           h2
4     bob   2         mobile           m2
5     bob   2          email     bob@mail
6   jones   3          house           h3
7   jones   3         mobile           m3
8   jones   3          email   jones@mail <-for same Name,ID and Contact_method get duplicate
9   jones   3          email     joe@mail <-for same Name,ID and Contact_method get duplicate
10  jones   4          house           h4
11  jones   4         mobile           m4
12  jones   4          email  jones2@mail

use pivot_table or groubpy with aggregating join:

cols = ['Name','ID','house','mobile','email']
df1 = df.pivot_table(index=['ID','Name'],
                     columns='Contact_method', 
                     values='Contact', 
                     aggfunc=','.join)
        .rename_axis(None, 1)
        .reset_index()
        .reindex_axis(cols, axis=1)
print (df1)
    Name  ID house mobile                email
0  sarah   1    h1     m1           sarah@mail
1    bob   2    h2     m2             bob@mail
2  jones   3    h3     m3  jones@mail,joe@mail <- join duplicates
3  jones   4    h4     m4          jones2@mail

df1 = df.groupby(['Name', 'ID', 'Contact_method'])['Contact']
        .apply(','.join)
        .unstack()
        .rename_axis(None, 1)
        .reset_index()
        .reindex_axis(cols, axis=1)
print (df1)
    Name  ID house mobile                email
0  sarah   1    h1     m1           sarah@mail
1    bob   2    h2     m2             bob@mail
2  jones   3    h3     m3  jones@mail,joe@mail <- join duplicates
3  jones   4    h4     m4          jones2@mail

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.