python pandas: pivot table

Question

I have a table like so:

Name   | ID | Contact_method | Contact
sarah    1   house            h1
sarah    1   mobile           m1
sarah    1   email            sarah@mail
bob      2   house            h2
bob      2   mobile           m2
bob      2   email            bob@mail
jones    3   house            h3
jones    3   mobile           m3
jones    3   email            jones@mail
jones    4   house            h4
jones    4   mobile           m4
jones    4   email            jones2@mail

And I want it like so:

Name  | ID | house | mobile | email
sarah   1    h1      m1       sarah@mail
bob     2    h2      m2       bob@mail
jones   3    h3      m3       jones@mail
jones   4    h4      m4       jones2@mail

I can already do this, but only through a very expensive pd.concat operation iterated over all the unique IDs. Is there a simple way to do this? I've also tinkered with pivot() and transpose(). Note that the duplicate name is there so that I can't rely on uniqueness of column values to, say, do a join.

piRSquared · Accepted Answer · 2017-03-16 23:58:45Z

2

Set the index with all columns except 'Contact_method', then unstack

df.set_index(
    ['Name', 'ID', 'Contact_method']
)['Contact'].unstack().rename_axis(None, 1).reset_index()

    Name  ID        email house mobile
0    bob   2     bob@mail    h2     m2
1  jones   3   jones@mail    h3     m3
2  jones   4  jones2@mail    h4     m4
3  sarah   1   sarah@mail    h1     m1

answered Mar 16, 2017 at 23:58

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

piRSquared Over a year ago

I have a desk to sit at in my new temporary place while we continue to look for houses. I am set up to work remotely for the time being. I'll be commuting back to Seattle twice a month. And soon I have to go back to get rid of all my stuff at old place. Still busy, but I've enjoyed having some time to answer questions. I hope your doing well! @jezrael

piRSquared Over a year ago

@jezrael yeah, I made a big push for legendary then I felt I could calm down a bit. You're almost top pandas all time. My next SO goal will be to get past DSM and Jeff on that list. I've never been very motivated by rep itself. I've given a lot away. I'll get 100k eventually.. I do want a t-shirt. You'll have to let me know if they give you anything.

piRSquared Over a year ago

I'd want to participate in stats.stackexchange.com and quant.stackexchange.com. However, I'd rather pick some other tags to get gold in. Like, I want my numpy badge, I've neglected my machine learning stuff. I want to get a gold badge in tensorflow (though I still have a lot to learn)

piRSquared Over a year ago

you want to start participating in meta? I think you'd be a good mod! You think about running for a position? @jezrael

greedy52 · Accepted Answer · 2017-03-16 23:54:21Z

0

One way is to build a contact dictionary (of dictionaries) based on ID, 'manualy'. Not sure if it is more efficient though.

people = dict()
for index, row in pd.iterrows():
    ID = row['ID']
    if ID not in people:
        people[ID] = {'ID': ID, 'Name': row['Name']}
    people[ID][row['Contact_method']] = row['Contact']

print pandas.DataFrame(people).transpose()

And output is:

  ID   Name        email house mobile
1  1  sarah   sarah@mail    h1     m1
2  2    bob     bob@mail    h2     m2
3  3  jones   jones@mail    h3     m3
4  4  jones  jones2@mail    h4     m4

answered Mar 16, 2017 at 23:54

greedy52

1,41510 silver badges8 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:26:04Z

I think piRSquared's solution is very nice, but if get:

ValueError: Index contains duplicate entries, cannot reshape

print (df)
     Name  ID Contact_method      Contact
0   sarah   1          house           h1
1   sarah   1         mobile           m1
2   sarah   1          email   sarah@mail
3     bob   2          house           h2
4     bob   2         mobile           m2
5     bob   2          email     bob@mail
6   jones   3          house           h3
7   jones   3         mobile           m3
8   jones   3          email   jones@mail <-for same Name,ID and Contact_method get duplicate
9   jones   3          email     joe@mail <-for same Name,ID and Contact_method get duplicate
10  jones   4          house           h4
11  jones   4         mobile           m4
12  jones   4          email  jones2@mail

use pivot_table or groubpy with aggregating join:

cols = ['Name','ID','house','mobile','email']
df1 = df.pivot_table(index=['ID','Name'],
                     columns='Contact_method', 
                     values='Contact', 
                     aggfunc=','.join)
        .rename_axis(None, 1)
        .reset_index()
        .reindex_axis(cols, axis=1)
print (df1)
    Name  ID house mobile                email
0  sarah   1    h1     m1           sarah@mail
1    bob   2    h2     m2             bob@mail
2  jones   3    h3     m3  jones@mail,joe@mail <- join duplicates
3  jones   4    h4     m4          jones2@mail

df1 = df.groupby(['Name', 'ID', 'Contact_method'])['Contact']
        .apply(','.join)
        .unstack()
        .rename_axis(None, 1)
        .reset_index()
        .reindex_axis(cols, axis=1)
print (df1)
    Name  ID house mobile                email
0  sarah   1    h1     m1           sarah@mail
1    bob   2    h2     m2             bob@mail
2  jones   3    h3     m3  jones@mail,joe@mail <- join duplicates
3  jones   4    h4     m4          jones2@mail

Collectives™ on Stack Overflow

python pandas: pivot table

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related