2

I’m working with an Excel file and need to read its contents into a DataFrame. When I use pandas (with default engine), I can specify the data type of the columns to be strings, which works perfectly:

import pandas as pd

df = pd.read_excel(fp, dtype=str, nrows=10)
print(df[col])

This gives me a column with values like:

4200000000

However, when I use Calamine to read the same file, the values in the same column end up with a .0 suffix:

from calamine import CalamineWorkbook

wb = CalamineWorkbook.from_path(fp)
row_list = wb.get_sheet_by_name(wb.sheet_names[0]).to_python()
print(row_list)

This results in: 4200000000.0

How can I stop Calamine from auto-guessing the data types? In pandas, I would use dtype=str, but my version of pandas does not support Calamine as an engine, and I cannot update it.

2
  • I have the same problem. I looked through the Rust codebase but did not find an answer there. Did you find a solution in the meantime? Commented Oct 31, 2024 at 10:15
  • 1
    No, for now I just believe this tool is unsafe to use because of dtype inference Commented Nov 4, 2024 at 10:01

1 Answer 1

0

Excel stores all numbers internally as floats, reading 10 as anything but 10.0 is in and of itself a data type conversion, which calamine does not perform. This is the cost of its speed.

Sign up to request clarification or add additional context in comments.

1 Comment

What about strings ? I noticed calamine changed strings into numbers after I read it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.