0

I am using Openpyxl library to read xlsx file and extract few contents and add more strings to my txt output file.The excel file I am currently using contain sheets with name Summary and Employee. My below code is working fine for my current excel file. Now the issue is I would to use the same code for reading another excel file containing more sheets whose sheetnames I am not sure of. So in my code line ws = wb['Employee']. The sheetname will change all the time. However, One thing I am sure about is I don't want to read any data from sheet1. All the data extraction will occur from sheet2 onwards in all the xlsx files. I am not sure how to proceed from here so any help will be appreciated.

Thanks in advance for your time and efforts!

Code:

from openpyxl import load_workbook

data_file='\\test.xlsx'

# Load the entire workbook.
wb = load_workbook(data_file)
ws = wb['Employee'] #Manually adding sheet name here


mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6

type_strs = {
    'String': 'VARCHAR(256)',
    'Numeric': 'NUMBER',
    'Date': 'NUMBER(4,0)',
    'Int': 'NUMBER'
}

for index, value in enumerate(mylines["Column_Type"]):
    mylines["Column_Type"][index] = type_strs.get(value, value)

    
for i in range(6, ws.max_row+1):  
        name = ws.cell(row=i, column=1).value
        name1=ws.cell(row=i, column=2).value
        mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
        mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
        for index, value in enumerate(mylines["Column_Type"]):
            mylines["Column_Type"][index] = type_strs.get(value, value)
        
        
        
theString = " "
for i in range(len(mylines['Column_name'])):
    theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
    if i < len(mylines['Column_name'])-1:
        theString += ", "


outputFile = open('/output.txt', 'w')  # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) + "\n")

outputFile.close() #Closing file

Updated Code based on SO User comment:

from openpyxl import load_workbook

data_file='\\test.xlsx'

# Load the entire workbook.
wb = load_workbook(data_file)
#ws = wb['Employee'] #Manually adding sheet name here


mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6

type_strs = {
    'String': 'VARCHAR(256)',
    'Numeric': 'NUMBER',
    'Date': 'NUMBER(4,0)',
    'Int': 'NUMBER'
}

for index, value in enumerate(mylines["Column_Type"]):
    mylines["Column_Type"][index] = type_strs.get(value, value)

skip = True
for ws in wb.worksheets:
    if skip == True:
        skip = False
    else:   
        for i in range(6, ws.max_row+1):  
            name = ws.cell(row=i, column=1).value
            name1=ws.cell(row=i, column=2).value
            mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
            mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
        for index, value in enumerate(mylines["Column_Type"]):
            mylines["Column_Type"][index] = type_strs.get(value, value)
        
        
        
theString = " "
for i in range(len(mylines['Column_name'])):
    theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
    if i < len(mylines['Column_name'])-1:
        theString += ", "


outputFile = open('/output.txt', 'w')  # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) + "\n")

outputFile.close() #Closing file

Excel data

<Sheet 1 Name -> Summary Sheet: Empty
<Sheet 2 Name -> Employee Sheet 
                File Name:  Employee
                Sheet Name: Employee
                File Type:  csv
    
                Field Name  Type
                   Name     String
            Salary  Numeric
            Date    Date
            Phone       Int

<Sheet 3 Name->   Employee1 Sheet
            File Name:  Employee
            Sheet Name: Employee1
            File Type:  csv
    
            Field Name  Type
            Employee Name   Date
            Employee Salary Int
            Employment Date Int
            Office Phone    Int
    

1 Answer 1

1

To iterate through all worksheets in a workbook and read data in them (except the first worksheet, remove the ws = wb['Employee']

Use a for loop (insert before for i in range(5,... as this

skip = True
for ws in wb.worksheets:
    if skip == True:
        skip = False
    else:
        for i in range(6, ws.max_row+1):
            name = ws.cell(row=i, column=1).value
            ....

This will read each sheet and append data to mylines, except the first sheet

Second Update As you mentioned in below comment, to add a new line with the new SQL query, please make these additional changes

  1. Add another entry to dictionary to indicate new line as below (careful to ensure the lines execute after all lines in a particular sheet are read) Edit the String formation so that once a NewLine is seen, that string is written to the output file. Do note that the NewFile boolean value will overwrite any file that is there. Multiple lines will be appended post that.
skip = True
for ws in wb.worksheets:
    if skip == True:
        skip = False
    else:
        for i in range(6, ws.max_row+1):  
            name = ws.cell(row=i, column=1).value
            print(i, name)
            name1=ws.cell(row=i, column=2).value
            print(name1)
            mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
            mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
            for index, value in enumerate(mylines["Column_Type"]):
                mylines["Column_Type"][index] = type_strs.get(value, value)
        mylines["Column_name"].append('NextLine')
        mylines["Column_Type"].append('NextLine')
theString = " "
NewFile = True
sheetList = wb.sheetnames
tabIndex = 1

for i in range(len(mylines['Column_name'])):
    if(mylines['Column_name'][i] != 'NextLine'):
        theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
        theString += ", "
    else:
        theString = theString[:-2]
        if NewFile:
            NewFile = False
            outputFile = open('output.txt', 'w')  # Text file Output
            print("New file  ", theString)
        else:
            outputFile = open('output.txt', 'a')
            print("Not new file  ", theString)
        outputFile.write("CREATE TABLE TRANSIENT TABLE STG_" + sheetList[tabIndex] +"({});".format(theString) + "\n")
        outputFile.close()
        tabIndex += 1
        theString = " "
Sign up to request clarification or add additional context in comments.

9 Comments

Thank you! Now the issue is I am able to read both sheets. However instead of outputting 2 seprate lines from 2 sheets my above updated code appended everything in one line. Any reason as to why this might have happen ?
Can you pls add the excel file with some sample data which will give the error you mentioned
Updated code will add the SQL Statements one at a time. If output.txt already available, that file will be overwritten
Your expectations appear to keep changing :-) Use wb.sheetnames which will give you all the sheetnames in the excel file as a list. When you are writing to the output.txt file, add the sheet name (list[1], list[2], etc.) to the CREATE TABLE.... string. This should be fairly easy, all the pieces are already there. Give it a shot and let me know if you are still facing issues
Code updated above. sheetList created and used. The names in tab will be taken as table name with prefix of STG_
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.