pandas read_csv set `dtype` by column index (not name)

Question

file.txt has a header and four columns. But the headers changes all the time.

something like:

,'non_standard_header_1','non_standard_header_2','non_standard_header_3'
,kdfjlkjdf, sdfdfd,,
,kdfjlkjwwdf, sdfddffd,,
,kdfjlkjwwdf,, sdfddffd,

I want to import file.txt in pandas, and I want the columns to be import as a object. The intuitive approach (to me):

dtype = [object, object, object] as in:

    daily_file              = pandas.read_csv('file.txt',
                                              usecols      = [1, 2, 3],
                                              dtype        = [object, object, object])

does not work, running the above, I get:

data type not understood

How to set column dtype on import w/o referencing (existing) column names?

Experimentally, i found a way to handle this. Given a CSV with a single unlabeled column (the first one, created by pandas.to_csv without specifying a label for the index), Pandas assigned the name "Unnamed: 0" to that column; I was able to use that same string as a dict key for dtype and correctly control the datatype for the column. Not sure how general this is, so leaving that and a proper "answer" to someone else. — Joe Germuska
– Joe Germuska, Commented Oct 22, 2019 at 20:41

cs95 · Accepted Answer · 2018-06-23 16:49:16Z

3

pd.read_csv(..., dtype=object) will globally apply the object dtype across all columns read in, if that's what you're looking for.

Otherwise, you'll need to pass a dict of the form {'col' : dtype} if you want to map dtypes to column names.

answered Jun 23, 2018 at 16:49

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user189035 Over a year ago

Almost: how do you map dtypes to colunm numbers?

cs95 Over a year ago

@user189035 Try changing the keys to the index. But if that doesn't work, then I'm afraid pandas has no support for it (yet).

ALollz Over a year ago

Indices seem to work for dtype, (use_cols has that functionality to reference either by name or index) Not sure what would happen if you had a column name overlap an index though.

DKSD · Accepted Answer · 2024-06-26 20:52:43Z

0

You should specify dictionary where keys are column numbers starting with 0:

types_dict = {0: "int32", 1: "float32", 2: "str", 3: "str"}
df = pd.read_csv("data.csv", dtype=types_dict, header=None)

answered Jun 26, 2024 at 20:52

DKSD

332 silver badges6 bronze badges

Collectives™ on Stack Overflow

pandas read_csv set `dtype` by column index (not name)

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related