5

I know that you can specify data types when reading excels using pd.read_excel (as outlined here). Can you do the same using pd.ExcelFile?

I have the following code:

 if ".xls" in 
     xl = pd.ExcelFile(path + "\\" + name, )
     for sheet in xl.sheet_names:
         xl_parsed = xl.parse(sheet)

When parsing the sheet, some of the values in the columns are displayed in scientific notation. I don't know the column names before loading so I need to import everything as string. Ideally I would like to be able to do something like xl_parsed = xl.parse(sheet, dtype = str). Any suggestions?

4
  • Please make sure that you refer to the documentation before asking. The accepted answer in the question you linked shows that read_excel takes a converters parameter. So does pd.ExcelFile.parse. Did you try it? Commented Mar 19, 2018 at 22:04
  • Although I should have read the end of your question more closely as that deals with something not covered by the docs. There was an issue about it here which got closed but doesn't seem to have been resolved in any proper way. There is one approach in there that you might be able to adapt; open the Excel in xlrd, inspect the headers to get the names, and create a dictionary comprehension to then pass as converters. Kinda ugly. Commented Mar 19, 2018 at 22:20
  • @roganjosh thanks for the input, that's what i did! I'll post it as an answer Commented Mar 27, 2018 at 10:28
  • The solution you suggested is supported by the parse method although it is not mentioned explicitly on the documentation, so you can just use xl_parsed = xl.parse(sheet, dtype=str) Commented Nov 29, 2021 at 11:43

2 Answers 2

3

If you would prefer a cleaner solution, I used the following:

excel = pd.ExcelFile(path)
for sheet in excel.sheet_names:
    columns = excel.parse(sheet).columns
    converters = {column: str for column in columns}

    data = excel.parse(sheet, converters=converters)
Sign up to request clarification or add additional context in comments.

Comments

1

I went with roganjosh's suggestion - open the excel first, get column names and then pass as converter.

if ".xls" in name:
    xl = pd.ExcelFile(path)
    sheetCounter = 1
        for sheet in xl.sheet_names:
        ### Force to read as string ###
        column_list = []
        df_column = pd.read_excel(path, sheetCounter - 1).columns
        for i in df_column:
            column_list.append(i)
        converter = {col: str for col in column_list}
        ##################
        xl_parsed = xl.parse(sheet, converters=converter)
        sheetCounter = sheetCounter + 1

1 Comment

Thanks for the answer. column_list seems pointless, you can just use df_column directly. You can also use for i, sheet in enumerate(xl.sheet_names): instead of sheetCounter. I assume the for sheet in ... should be indented -4 spaces.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.