87

I am just getting started with Pandas and I am reading in a csv file using the read_csv() method. The difficulty I am having is preventing pandas from converting my telephone numbers to large numbers, instead of keeping them as strings. I defined a converter which just left the numbers alone, but then they still converted to numbers. When I changed my converter to prepend a 'z' to the phone numbers, then they stayed strings. Is there some way to keep them strings without modifying the values of the fields?

2
  • 5
    Please show us your code Commented May 15, 2012 at 1:48
  • 5
    @Gardner: have you considered accepting an answer? Commented Dec 14, 2015 at 2:58

4 Answers 4

104

Since Pandas 0.11.0 you can use dtype argument to explicitly specify data type for each column:

d = pandas.read_csv('foo.csv', dtype={'BAR': 'S10'})
Sign up to request clarification or add additional context in comments.

5 Comments

Note that this is not available (yet, hopefully) for some other input functions, like pandas.read_fwf()
I revisited the topic and support for dtype has been already added to the pandas.read_fwf :)
This method doesn't work for large datasets is there any other way to read a csv and only particular columns.
This doesn't work when the input is a bytes io object, I get error EmptyDataError: No columns to parse from file. Any way to solve this?
To convert to a string the documentation recommends using 'str'
21

It looks like you can't avoid pandas from trying to convert numeric/boolean values in the CSV file. Take a look at the source code of pandas for the IO parsers, in particular functions _convert_to_ndarrays, and _convert_types. https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py

You can always assign the type you want after you have read the file:

df.phone = df.phone.astype(str)

4 Comments

Thanks @lbolla, this helped in one of my bugfix, where a float value was read as string since another column was string, and later causing issues in aggregation functions. I had to do df['col'] = df['col'].astype(float64)
say I have a column of ids (which is all int) that I'd like to use as string, but by some condition pandas will read them as float, 1->1.0, 2->2.0, then without convert it back to int first, it will be converted to '1.0', '2.0' which is not desirable. that's why I just want pandas to read it as string.
This is not the answer. Your solution doesn't solve tproblems as memory error on big files.
this won't solve issues where there are leading zeros that get lost
0

I had luck by reading the entire file in as string, then manually specifying datatypes later. In my situation, I had a column which had IDs that could contain strings like "08" which would be different from an ID of "8".

The first thing I tried was df = pd.read_csv(dtype={"ID": str}) but for some reason, this was still converting "08" to "8" (at least it was still a string, but it must have been interpreted as an integer first, which removed the leading 0).

The thing that worked for me was this: df = pd.read_csv(dtype=str) And then I could go through and manually assign other columns their datatypes as needed like @lbolla mentioned.

For some reason, applying the data type across the entire document skipped the type inference step I suppose. Annoying this isn't the default behavior when specifying a specific column data type :(

2 Comments

"applying the data type across the entire document skipped the type inference step I suppose" - Yeah, what else would it do? I'm not sure if I'm confused about what you're saying or if you're confused about how read_csv works.
You might be interested in DataFrame.infer_objects()
0
  1. Use low_memory=False while reading the file to skip dtype detection.

    df = pd.read_csv('somefile.csv', low_memory=False)

  2. Define dtypes while reading the file to force column to be read as an object.

    df = pandas.read_csv('somefile.csv', dtype={'phone': object})

Official Pandas Docs

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.