Pandas read_csv function possible to auto detect dtype?

Question

I have a large CSV file with over 200+ columns. Some of the columns are string, some varchar, some integers and some floats.

When i just read my csv file into a pandas dataframe, it is able to detect which are the numerical columns. However, it will give me the specify dtype or low memory error warning.

df = pd.read_csv('myfile.csv')
df_not_num = df_raw.select_dtypes(exclude =[np.number,np.int16,np.bool,np.float32])
print len(df)
>>>200
print len(list(df_not_num))
>>> 10

Then i try to specify a dtype: dtype='unicode' But this causes all my columns to be objects. It is too much manual work to speicfy each dtype per column name when reading the CSV into a dataframe.

pd.read_csv('myfile.csv', dtype = 'unicode')
df_not_num = df_raw.select_dtypes(exclude =[np.number,np.int16,np.bool,np.float32])
print len(df)
>>>>200
print len(list(df_not_num))
>>> 200

So the only way to avoid the low memory warning is to specify a dtype. But how do i specify that i have mixed dtypes for different columns without having to manually specify the dtype of each of the 200 columns?

Just specifying "mixed types" won't help read_csv. You either have to specify particular types for some columns by passing a dict, e.g.: {‘a’: np.float64, ‘b’: np.int32} or specify one dtype, which will try to be applied to all columns, or none. Also, there is no "varchar" type in Python. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Feb 27, 2017 at 23:42
Possible duplicate of Pandas read_csv low_memory and dtype options — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Feb 27, 2017 at 23:44

Reza Rahemtola · Accepted Answer · 2024-02-11 09:31:56Z

2

You can read just the first row from the csv to have the list of column names:

col_names = pd.read_csv('file.csv', nrows=0).columns.tolist()

Then transform it into a dictionary dtypes_dict={col_name: dtype} based on the conditions you need.

Then use the dictionary of dtypes during reading:

pd.read_csv('file.csv', dtype=dtypes_dict)

edited Feb 11, 2024 at 9:31

Reza Rahemtola

1,1827 gold badges18 silver badges31 bronze badges

answered Feb 8, 2024 at 12:58

AdrenalyN

316 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas read_csv function possible to auto detect dtype?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related