2

file.txt has a header and four columns. But the headers changes all the time.

something like:

,'non_standard_header_1','non_standard_header_2','non_standard_header_3'
,kdfjlkjdf, sdfdfd,,
,kdfjlkjwwdf, sdfddffd,,
,kdfjlkjwwdf,, sdfddffd,

I want to import file.txt in pandas, and I want the columns to be import as a object. The intuitive approach (to me):

dtype = [object, object, object] as in:

    daily_file              = pandas.read_csv('file.txt',
                                              usecols      = [1, 2, 3],
                                              dtype        = [object, object, object])

does not work, running the above, I get:

data type not understood

How to set column dtype on import w/o referencing (existing) column names?

1
  • Experimentally, i found a way to handle this. Given a CSV with a single unlabeled column (the first one, created by pandas.to_csv without specifying a label for the index), Pandas assigned the name "Unnamed: 0" to that column; I was able to use that same string as a dict key for dtype and correctly control the datatype for the column. Not sure how general this is, so leaving that and a proper "answer" to someone else. Commented Oct 22, 2019 at 20:41

2 Answers 2

3

pd.read_csv(..., dtype=object) will globally apply the object dtype across all columns read in, if that's what you're looking for.

Otherwise, you'll need to pass a dict of the form {'col' : dtype} if you want to map dtypes to column names.

Sign up to request clarification or add additional context in comments.

3 Comments

Almost: how do you map dtypes to colunm numbers?
@user189035 Try changing the keys to the index. But if that doesn't work, then I'm afraid pandas has no support for it (yet).
Indices seem to work for dtype, (use_cols has that functionality to reference either by name or index) Not sure what would happen if you had a column name overlap an index though.
0

You should specify dictionary where keys are column numbers starting with 0:

types_dict = {0: "int32", 1: "float32", 2: "str", 3: "str"}
df = pd.read_csv("data.csv", dtype=types_dict, header=None)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.