I am just getting started with Pandas and I am reading in a csv file using the read_csv() method. The difficulty I am having is preventing pandas from converting my telephone numbers to large numbers, instead of keeping them as strings. I defined a converter which just left the numbers alone, but then they still converted to numbers. When I changed my converter to prepend a 'z' to the phone numbers, then they stayed strings. Is there some way to keep them strings without modifying the values of the fields?
-
5Please show us your codeMike Pennington– Mike Pennington2012-05-15 01:48:30 +00:00Commented May 15, 2012 at 1:48
-
5@Gardner: have you considered accepting an answer?tumultous_rooster– tumultous_rooster2015-12-14 02:58:13 +00:00Commented Dec 14, 2015 at 2:58
4 Answers
Since Pandas 0.11.0 you can use dtype argument to explicitly specify data type for each column:
d = pandas.read_csv('foo.csv', dtype={'BAR': 'S10'})
5 Comments
pandas.read_fwf()dtype has been already added to the pandas.read_fwf :)EmptyDataError: No columns to parse from file. Any way to solve this?It looks like you can't avoid pandas from trying to convert numeric/boolean values in the CSV file. Take a look at the source code of pandas for the IO parsers, in particular functions _convert_to_ndarrays, and _convert_types.
https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py
You can always assign the type you want after you have read the file:
df.phone = df.phone.astype(str)
4 Comments
I had luck by reading the entire file in as string, then manually specifying datatypes later. In my situation, I had a column which had IDs that could contain strings like "08" which would be different from an ID of "8".
The first thing I tried was df = pd.read_csv(dtype={"ID": str}) but for some reason, this was still converting "08" to "8" (at least it was still a string, but it must have been interpreted as an integer first, which removed the leading 0).
The thing that worked for me was this:
df = pd.read_csv(dtype=str)
And then I could go through and manually assign other columns their datatypes as needed like @lbolla mentioned.
For some reason, applying the data type across the entire document skipped the type inference step I suppose. Annoying this isn't the default behavior when specifying a specific column data type :(
2 Comments
read_csv works.DataFrame.infer_objects()Use low_memory=False while reading the file to skip dtype detection.
df = pd.read_csv('somefile.csv', low_memory=False)
Define dtypes while reading the file to force column to be read as an object.
df = pandas.read_csv('somefile.csv', dtype={'phone': object})