0

How can I convert pandas DataFrame into the following Numpy array with column names?

array([('Heidi Mitchell', '[email protected]', 74, 52, 'female', '1121', 'cancer', '03/06/2018'),
       ('Kimberly Kent', 'wilsoncarla@mitchell-gree', 63, 51, 'male', '2003', 'cancer', '16/06/2017')],
      dtype=[('name', '<U16'), ('email', '<U25'), ('age', '<i4'), ('weight', '<i4'), ('gender', '<U10'), ('zipcode', '<U6'), ('diagnosis', '<U6'), ('dob', '<U16')])

This is my pandas DataFrame df:

col1  col2
3     5
3     1
4     5    
1     5
2     2

I tried to convert it as follows:

import numpy as np

dt = np.dtype([('col1', np.int32), ('col2', np.int32)])
arr = np.array(df.values, dtype=dt)

But it gives me the output as follows:

array([[(3, 5), (3, 1)],
      ...
      dtype=[('col1', '<i4'), ('col2', '<i4')])

For some reason, the rows of data are grouped [(3, 5), (3, 1)] instead of [(3, 5), (3, 1), (4, 5), (1, 5), (1, 2)].

2 Answers 2

1

Use the pandas function to_records(), which converts a dataframe to a numpy record array. the link is the following: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_records.html

Some examples given in the website are the following:

>>> df = pd.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]},
                       index=['a', 'b'])
>>> df
   A     B
a  1  0.50
b  2  0.75
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
          dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])

The index can be excluded from the record array:

>>> df.to_records(index=False)
rec.array([(1, 0.5 ), (2, 0.75)],
          dtype=[('A', '<i8'), ('B', '<f8')])
Sign up to request clarification or add additional context in comments.

Comments

1

You can use df.to_records(index=False) to convert the dataframe to a structured array:

import pandas as pd
data = [ { "col1": 3, "col2": 5 }, { "col1": 3, "col2": 1 }, { "col1": 4, "col2": 5 }, { "col1": 1, "col2": 5 }, { "col1": 2, "col2": 2 } ]
df = pd.DataFrame(data)
df.to_records(index=False)

Output:

rec.array([(3, 5), (3, 1), (4, 5), (1, 5), (2, 2)],
          dtype=[('col1', '<i8'), ('col2', '<i8')])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.