0

I have a data frame as below. I want to make it a numpy array. When i using df.values command it is making as numpy array but all the attributes are converted to float. I checked df.values documentation but was not helpful, can i assign same datatype of df to numpy?

Thanks in advance for your help

                   High          Low  ...      Volume    Adj Close
Date                                  ...                         
2018-12-20  2509.629883  2441.179932  ...  5585780000  2467.419922
2018-12-21  2504.409912  2408.550049  ...  7609010000  2416.620117
2018-12-24  2410.340088  2351.100098  ...  2613930000  2351.100098
2018-12-26  2467.760010  2346.580078  ...  4233990000  2467.699951
2018-12-27  2489.100098  2397.939941  ...  4096610000  2488.830078
2018-12-28  2520.270020  2472.889893  ...  3702620000  2485.739990
2018-12-31  2509.239990  2482.820068  ...  3442870000  2506.850098
2019-01-02  2519.489990  2467.469971  ...  3733160000  2510.030029
7
  • pandas to_record? Commented Mar 10, 2019 at 3:31
  • Problem with to_records() approach is that it is creating an array with 1 dimension like ((14,) in my case , how do i same dimension as that of df Commented Mar 10, 2019 at 4:53
  • It is 1d with different fields for each of the df columns. That way you can have a different dtype for each field. Look up structured array or record array. Commented Mar 10, 2019 at 5:13
  • Thanks for the pointer will go over structured array Commented Mar 10, 2019 at 5:19
  • Hi Wen,Can we access columns in np.array when we have to_record to make np.array from panda? Commented Mar 10, 2019 at 12:24

2 Answers 2

1

Numpy arrays have a uniform data type, as you can see from the documentation:

numpy.ndarray class numpy.ndarray(shape, dtype=float, buffer=None,

offset=0, strides=None, order=None)[source] An array object represents a multidimensional, homogeneous array of fixed-size items. An associated data-type object describes the format of each element in the array (its byte-order, how many bytes it occupies in memory, whether it is an integer, a floating point number, or something else, etc.)

When you use the df.values, it will cast all values to the most suitable datatype to keep the homogeneity.

The pandas.DataFrame.values also mentions that:

Notes

The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.

e.g. If the dtypes are float16 and float32, dtype will be upcast to float32. If dtypes are int32 and uint8, dtype will be upcast to int32. By numpy.find_common_type() convention, mixing int64 and uint64 will result in a float64 dtype.

Sign up to request clarification or add additional context in comments.

1 Comment

We can have a different data type for numpy array attributes when I am using array=df.to_records() to create numpy array I am getting different dtype of each attribute. Problem with to_records() approach is that it is creating array with 1 dimension like ((14,) in my case
0

You can do it with NumPy structured arrays. I will create a DataFrame with only 2 rows and 2 columns similar to yours to demonstrate how you can do it with any size of DataFrame.

import Pandas as pd
import Numpy as np

df = pd.DataFrame({'High': [2509.629883, 2504.409912], 
                   'Volume': [5585780000, 7609010000]}, 
                  index=np.array(['2018-12-20', '2018-12-21'], dtype='datetime64'))

Then you create an empty NumPy array defining what datatype each column must have. In my example, I only have 2 rows so the array will only have 2 rows as following:

array = np.empty(2, dtype={'names':('col1', 'col2', 'col3'),
                          'formats':('datetime64[D]', 'f8', 'i8')})

array['col1'] = df.index
array['col2'] = df['High']
array['col3'] = df['Volume']

and, the array will look like:

array([('2018-12-20', 2509.629883, 5585780000),
       ('2018-12-21', 2504.409912, 7609010000)],
      dtype=[('col1', '<M8[D]'), ('col2', '<f8'), ('col3', '<i8')])

You also can create a np.recarray class using command np.rec.array. This is almost identical with structured arrays with only one extra feature. You can access fields as attributes, i.e. array.col1 instead of array['col1']. However, numpy record arrays are apparently slower than structured arrays!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.