build a DataFrame with columns from tuple of arrays

Question

I am struggling with the basic task of constructing a DataFrame of counts by value from a tuple produced by np.unique(arr, return_counts=True), such as:

import numpy as np
import pandas as pd

np.random.seed(123)  
birds=np.random.choice(['African Swallow','Dead Parrot','Exploding Penguin'], size=int(5e4))
someTuple=np.unique(birds, return_counts = True)
someTuple
#(array(['African Swallow', 'Dead Parrot', 'Exploding Penguin'], 
#       dtype='<U17'), array([16510, 16570, 16920], dtype=int64))

First I tried

pd.DataFrame(list(someTuple))
# Returns this:
#                  0            1                  2
# 0  African Swallow  Dead Parrot  Exploding Penguin
# 1            16510        16570              16920

I also tried pd.DataFrame.from_records(someTuple), which returns the same thing.

But what I'm looking for is this:

#              birdType      birdCount
# 0     African Swallow          16510  
# 1         Dead Parrot          16570  
# 2   Exploding Penguin          16920

What's the right syntax?

your second method would have been close with additional '.T' functionality: pd.DataFrame.from_records(someTuple).T — Siraj S.
– Siraj S., Commented Aug 23, 2016 at 19:18

Divakar · Accepted Answer · 2016-08-22 19:43:30Z

7

Here's one NumPy based solution with np.column_stack -

pd.DataFrame(np.column_stack(someTuple),columns=['birdType','birdCount'])

Or with np.vstack -

pd.DataFrame(np.vstack(someTuple).T,columns=['birdType','birdCount'])

Benchmarking np.transpose, np.column_stack and np.vstack for staking 1D arrays into columns to form a 2D array -

In [54]: tup1 = (np.random.rand(1000),np.random.rand(1000))

In [55]: %timeit np.transpose(tup1)
100000 loops, best of 3: 15.9 µs per loop

In [56]: %timeit np.column_stack(tup1)
100000 loops, best of 3: 11 µs per loop

In [57]: %timeit np.vstack(tup1).T
100000 loops, best of 3: 14.1 µs per loop

edited Aug 22, 2016 at 19:43

answered Aug 22, 2016 at 19:21

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

C8H10N4O2 Over a year ago

These are both very fast numpy solutions, just what I was looking for. An equally fast answer was pd.DataFrame(np.transpose(someTuple), columns=['birdType', 'birdCount']) which another user gave but then deleted (not sure why).

Divakar Over a year ago

@C8H10N4O2 Added some timings on those three, all look equally fast it seems.

piRSquared · Accepted Answer · 2016-08-22 19:28:47Z

5

create a dictionary

pd.DataFrame(dict(birdType=someTuple[0], birdCount=someTuple[1]))

answered Aug 22, 2016 at 19:28

piRSquared

296k68 gold badges509 silver badges654 bronze badges

2 Comments

juanpa.arrivillaga Over a year ago

Nice. I need to start using the plain dictionary constructor with keyword arguments more often. It really is very convenient.

Jan Over a year ago

Pining for the fjords!

juanpa.arrivillaga · Accepted Answer · 2016-08-22 19:20:42Z

4

Using your tuple, you can do the following:

In [4]: pd.DataFrame(list(zip(*someTuple)), columns = ['Bird', 'BirdCount'])
Out[4]: 
                Bird  BirdCount
0    African Swallow      16510
1        Dead Parrot      16570
2  Exploding Penguin      16920

answered Aug 22, 2016 at 19:20

juanpa.arrivillaga

97.6k14 gold badges141 silver badges190 bronze badges

Comments

Alexander · Accepted Answer · 2016-08-22 19:19:28Z

2

You could use Counter.

from collections import Counter

c = Counter(birds)

>>> pd.Series(c)
African Swallow      16510
Dead Parrot          16570
Exploding Penguin    16920
dtype: int64

You could also use value_counts on the series.

>>> pd.Series(birds).value_counts()
Exploding Penguin    16920
Dead Parrot          16570
African Swallow      16510
dtype: int64

answered Aug 22, 2016 at 19:19

Alexander

111k32 gold badges212 silver badges208 bronze badges

Collectives™ on Stack Overflow

build a DataFrame with columns from tuple of arrays

4 Answers 4

2 Comments

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related