Python: Create structured numpy structured array from two columns in a DataFrame

Question

How do you create a structured array from two columns in a DataFrame? I tried this:

df = pd.DataFrame(data=[[1,2],[10,20]], columns=['a','b'])
df

    a   b
0   1   2
1   10  20

x = np.array([([val for val in list(df['a'])],
               [val for val in list(df['b'])])])

But this gives me this:

array([[[ 1, 10],
        [ 2, 20]]])

But I wanted this:

[(1,2),(10,20)]

Thanks!

Because a package that I am using only takes input as a structured array. Why is this important? — Kim O
– Kim O, Commented Jul 11, 2018 at 8:00
Because there might be no need to create a list of tuple at all or it's also useful in terms of the way of creating that list of tuple. — Kasravnd
– Kasravnd, Commented Jul 11, 2018 at 8:03

jpp · Accepted Answer · 2018-07-11 08:57:45Z

There are a couple of methods. You may experience a loss in performance and functionality relative to regular NumPy arrays.

record array

You can use pd.DataFrame.to_records with index=False. Technically, this is a record array, but for many purposes this will be sufficient.

res1 = df.to_records(index=False)

print(res1)

rec.array([(1, 2), (10, 20)], 
          dtype=[('a', '<i8'), ('b', '<i8')])

structured array

Manually, you can construct a structured array via conversion to tuple by row, then specifying a list of tuples for the dtype parameter.

s = df.dtypes
res2 = np.array([tuple(x) for x in df.values], dtype=list(zip(s.index, s)))

print(res2)

array([(1, 2), (10, 20)], 
      dtype=[('a', '<i8'), ('b', '<i8')])

What's the difference?

Very little. recarray is a subclass of ndarray, the regular NumPy array type. On the other hand, the structured array in the second example is of type ndarray.

type(res1)                    # numpy.recarray
isinstance(res1, np.ndarray)  # True
type(res2)                    # numpy.ndarray

The main difference is record arrays facilitate attribute lookup, while structured arrays will yield AttributeError:

print(res1.a)
array([ 1, 10], dtype=int64)

print(res2.a)
AttributeError: 'numpy.ndarray' object has no attribute 'a'

Related: NumPy “record array” or “structured array” or “recarray”

jezrael · Accepted Answer · 2018-07-11 08:27:10Z

1

Use list comprehension for convert nested lists to tuples:

print ([tuple(x) for x in df.values.tolist()])
[(1, 2), (10, 20)]

Detail:

print (df.values.tolist())
[[1, 2], [10, 20]]

EDIT: You can convert by to_records and then to np.asarray, check link:

df = pd.DataFrame(data=[[True, 1,2],[False, 10,20]], columns=['a','b','c'])
print (df)
       a   b   c
0   True   1   2
1  False  10  20

print (np.asarray(df.to_records(index=False)))
[( True,  1,  2) (False, 10, 20)]

edited Jul 11, 2018 at 8:27

answered Jul 11, 2018 at 7:51

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

3 Comments

Kim O Over a year ago

Neither are numpy structured arrays. Is it possible to do this?

jezrael Over a year ago

@KimO - Can you explain more?

Kim O Over a year ago

Yes. docs.scipy.org/doc/numpy/user/basics.rec.html The result should be: array([(x,y), (x2,y2)]

ags29 · Accepted Answer · 2018-07-11 08:15:01Z

0

Here's a one-liner:

list(df.apply(lambda x: tuple(x), axis=1))

or

df.apply(lambda x: tuple(x), axis=1).values

edited Jul 11, 2018 at 8:15

answered Jul 11, 2018 at 8:07

ags29

2,7061 gold badge11 silver badges15 bronze badges

4 Comments

Kim O Over a year ago

This is not a numpy structured array.. is that possible?

ags29 Over a year ago

edited it, is the second version what you are looking for?

Kim O Over a year ago

YES! Is there are way to control the types of the fields? For example, if the dataFrame has two columns and I want the first to turn into a "binary class event indicator"? As explained here: scikit-survival.readthedocs.io/en/latest/generated/… Search for "structured array" .. So "bool" type

jpp Over a year ago

I strongly recommend you don't use object dtype for integers, even with structured arrays.

Collectives™ on Stack Overflow

Python: Create structured numpy structured array from two columns in a DataFrame

3 Answers 3

record array

structured array

Comments

3 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

record array

structured array

Comments

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related