1

I am creating a dataframe from a csv like this;

topcells=pd.DataFrame.from_csv("url/output_topcell.txt", header=0, sep=', ', parse_dates=True, encoding=None, tupleize_cols=False)

The column I am interested (cell) in contains long numbers (e.g. 6468716846847) which I need to be cast as strings.

After creating the dataframe the datatype seems to be numpy.float64 by default (including some nan values)

When I use:

topcells.cell=topcells.cell.astype(str)

or:

topcells['cell']=topcells['cell'].apply(lambda x: str(x))

The string I get is not actually "6468716846847" but something like "6.468716846847e+12"

How can I avoid this scientific notation and get the full number as a string?

1 Answer 1

5

You should use the read_csvfunction from the top-level namespace, it has more options for reading, including a dtype parameter.

for example, with tst.csv:

c1,c2,c3,c4,c5
a,b,6468716846847,12,13
d,e,6468716846848,13,14

you get:

In [11]: pd.read_csv('tst.csv', dtype={'c3': 'str'})
Out[11]: 
  c1 c2             c3  c4  c5
0  a  b  6468716846847  12  13
1  d  e  6468716846848  13  14

[2 rows x 5 columns]
Sign up to request clarification or add additional context in comments.

1 Comment

assuming no nans in that column u could also read in as int64

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.