17

I was able to create dataframe and force one data type by

import pandas as pd
test = pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}, dtype=int)

But I want to specify type for each column. How can I do this? I tried the following which doesn't work as the resulting dtypes are objects and b columns are not casted into integers.

test = pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}, dtype=[('a', int),('b', int)])

Jeff helped with above case. But I found another problem when I try to create an empty dataframe and I want to be able to specify column types. For single type across columns, I could do

test = pd.DataFrame(columns=['a','b'], dtype=int)

What if I want to specify type for each of 'a' and 'b'?

1
  • This is not supported (potentially it could take a dict), you realize that passing dtype is optional? Commented Mar 25, 2014 at 19:07

3 Answers 3

7

You can pass in a Series which has a dtype parameter

In [15]: pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}).dtypes
Out[15]: 
a      int64
b    float64
dtype: object

In [16]: pd.DataFrame({'a':Series([1,2,3],dtype='int32'), 'b':Series([1.1,2.1,3.1],dtype='float32')}).dtypes
Out[16]: 
a      int32
b    float32
dtype: object
Sign up to request clarification or add additional context in comments.

3 Comments

next but similar question, what if I want to create an empty dataframe with specified types?
not recommended to create an empty frame at all (nor can you create it with specified types). Create the data you need, e.g. series or whatever, and just concat together or use the above method if you really need separate dtypes.
I guess you could use the pass-Series trick, like dmap = {'a': 'int32', 'b': 'float64', 'c': 'int64'}; df = pd.DataFrame({k: pd.Series(dtype=v) for k,v in dmap.items()}), but as Jeff says, starting from an empty frame is usually an antipattern.
4

Yes, good question. You can try to specify one common dtype at the time you create the dataframe or add empty numpy arrays with different dtypes. Nevertheless, my experience is that pandas tends to infer the dtype for the whole dataframe based on the data you add. I feel it is better to specify the dtypes for the various columns after you have added your data to the dataframe:

convert_dict = {'a': int, 'b': float}
df = df.astype(convert_dict)

Comments

3

You can pass in a dictionary of numpy arrays, with specified dtypes; this works for creating both filled and empty arrays. (This answer is a slight adaptation on my answer here.)

Here's an empty array:

df = pd.DataFrame(data={'a' : np.array([], dtype=int),
                        'b' : np.array([], dtype=float)
                       }
                 )

Here's a filled_array:

df = pd.DataFrame(data={'a' : np.array([1,2,3], dtype=int),
                        'b' : np.array([4,5,6], dtype=float)
                       }
                 )

And you can use basically any type for dtype, such as object, str, datetime.datetime or CrazyClassYouDefined. That said, if pandas doesn't specifically support a type (such as str), pandas will fall back to treating that column as object. Don't worry though, everything should still work.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.