0

I have 30+ xlsx files in same directory and using python I would like to convert all files to csv with utf-8 encoding, regardless of whatever encoding is present in the file. I am using python's magic library to get the file names (below code).For conversion, I tried the code mention by SO user Julian here (I used the code posted here), but the code is throwing an error saying "InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm. Below is the code that is throwing an error.The second issue is based on my limited python knowledge I believe code will work for one excel file. How should I make it work for multiple files ?

Thanks in advance for your help!

# import a library to detect encodings
import magic
import glob

print("File".ljust(45), "Encoding")
for filename in glob.glob('path*.xlsx'):
    with open(filename, 'rb') as rawdata:
        result = magic.from_buffer(rawdata.read(2048))
    print(filename.ljust(45), result)

Code throwing error from SO User github link mentioned here

    from openpyxl import load_workbook
    import csv
    from os import sys
    
    def get_all_sheets(excel_file):
        sheets = []
        workbook = load_workbook(excel_file,read_only=True,data_only=True)
        all_worksheets = workbook.get_sheet_names()
        for worksheet_name in all_worksheets:
            sheets.append(worksheet_name)
        return sheets
    
    def csv_from_excel(excel_file, sheets):
        workbook = load_workbook(excel_file,data_only=True)
        for worksheet_name in sheets:
            print("Export " + worksheet_name + " ...")
    
            try:
                worksheet = workbook.get_sheet_by_name(worksheet_name)
            except KeyError:
                print("Could not find " + worksheet_name)
                sys.exit(1)
    
            your_csv_file = open(''.join([worksheet_name,'.csv']), 'wb')
            wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
            for row in worksheet.iter_rows():
                lrow = []
                for cell in row:
                    lrow.append(cell.value)
                wr.writerow(lrow)
            print(" ... done")
            your_csv_file.close()
    
    if not 2 <= len(sys.argv) <= 3:
        print("Call with " + sys.argv[0] + " <xlxs file> [comma separated list of sheets to export]")
        sys.exit(1)
    else:
        sheets = []
        if len(sys.argv) == 3:
            sheets = list(sys.argv[2].split(','))
        else:
            sheets = get_all_sheets(sys.argv[1])
        assert(sheets != None and len(sheets

) > 0)
    csv_from_excel(sys.argv[1], sheets)
1
  • The header of your question contradicts the error. Commented May 27, 2022 at 4:44

2 Answers 2

3

Have you tried to use Pandas library? You can store all the files in a list using os. You can then loop through the list and open each Excel file using read_excel and then write to a csv. So it will look something like this:

"""Code to read excel workbooks and output each sheet as a csv""" 
""""with utf-8 encoding"""
#Declare a file path where you will store all your excel workbook. You 
#can update the file path for the ExcelPath variable
#Declare a file path where you will store all your csv output. You can 
#update the file path for the CsvPath variable

import pandas as pd
import os

ExcelPath = "C:/ExcelPath" #Store path for your excel workbooks
CsvPath = "C:/CsvPath" #Store path for you csv outputs

fileList = [f for f in os.listdir(ExcelPath)]

for file in fileList:
    xls = pd.ExcelFile(ExcelPath+'/'+file)
    sheets = xls.sheet_names #Get the names of each and loop to create 
                              #individual csv files 
    for sheet in sheets:
        fileNameCSV = str(file)[:-5]+'.'+str(sheet) #declare the csv 
                      #filename which will be excelWorkbook + SheetName
        df = pd.read_excel(ExcelPath+'/'+file, sheet_name = sheet)
        os.chdir(CsvPath)
        df.to_csv("{}.csv".format(fileNameCSV), encoding="utf-8")

Not the best but should meet your needs

Sign up to request clarification or add additional context in comments.

11 Comments

Hi Nabeel, Thank you and welcome to SO. I tried above code and was curious about sheet_name=someSheetName. What will be the name Sheet, as we are reading files from directory ?
It will be the sheet name (or tab name) from the excel file. I assumed it will be the same and in all your excel sheets. By default excel has Sheet1 as the default name. If you have different sheets in each workbook - you can store the name of each sheet and then for each excel file you can loop through each sheet
Could you please share the code where each excel has different sheet name ? As I also mentioned in my above comments to Sergey Zaykov. I have 2 excel with multiple sheets. How would the above code work in that scenario ? I would like to use Pandas if it is doable for multiple sheets.
New to SO so editting the code part to make sure is correct is time consuming
phew!! glad you were able to solve it... good luck and continue coding
|
1

In first, the first error is obvious: InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first.

Does Excel successfully open this file? If yes, we need the workbook (or small part of it).

The answer to the second question:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# vi:ts=4:et

"""I test to open multiple files."""
import csv
from pathlib import Path

from openpyxl import load_workbook

# find all *.xlsx files into current directory
# and iterate over it
for file in Path('.').glob('*.xlsx'):
    # read the Excel file
    wb = load_workbook(file)
    # small test (optional)
    print(file, wb.active.title)
    # export all sheets to CSV
    for sheetname in wb.sheetnames:
        # Write to utf-8 encoded file with BOM signature
        with open(f'{file.stem}-{sheetname}.csv', 'w',
                  encoding="utf-8-sig") as csvfile:
            # Write to CSV
            spamwriter = csv.writer(csvfile)
            # Iterate over rows in sheet
            for row in wb[sheetname].rows:
                # Write a row
                spamwriter.writerow([cell.value for cell in row])

Also you can explicitly specify the dialect of csv as csv.writer parameter.

11 Comments

Receiving an error that says "ValueError: Table with name Data already exists" I checked using python-magic and file encoding is Microsoft Excel 2007+
I understand nothing. Where you got this error?
Sorry..at line wb = load_workbook(file)
After opening the file with Excel, saving it and closing it, it started working. Link
Oh, thanks for explanation. Well, does all of ten sheets need to save into one file?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.