1

So I've an excel sheet that has multiple tabs and each individual tab has multiple tables in it. So i want to read the file in such a way that it reads each table from each tab of the sheet, for instance,

Tab1 has five tables in it.
Tab2 has Ten tables in it.
.....
.....

I want to read each one of these table in pandas dataframe and then save it to sql database. I know how how to read multiple tabs from the excel sheet.

Can anyone help me out here or point me to a direction where i can find a lead?

The tables in the tab are pre-defined and have name. Thats how it looks like in each tab Tab from excel sheet

6
  • are the tables defined? like real excel tables? with names? Commented Dec 22, 2020 at 0:09
  • Yeah each table is defined and has name. I have added an image of how the tables look like. @sammywemmy Commented Dec 22, 2020 at 0:14
  • This are not actual excel tables. These are modified and user created. Excel has an built-in function that creates tables with filters and some other features. This is user defined and is not the same as the excel built in. For this case you may have to manually control the reading of the tables using openpyxl. If you can share one of the tabs, I could have a look at it, and maybe create a function. Are the table the same form throughout? Commented Dec 22, 2020 at 0:15
  • Yeah, so this is how my tables are defined in each tab. Any ideas about how it can be read? @sammywemmy Commented Dec 22, 2020 at 0:17
  • 1
    Yes of course i can share one of the tabs. Its available in the link. @sammywemmy Commented Dec 22, 2020 at 0:30

1 Answer 1

1

You probably have to tweak it to match your data; imagine if you have some tables below and some above. This, hopefully, should point you in the right direction. Also, note the number of for loops I used; I believe you can do better and optimize it further.

from openpyxl import load_workbook
from collections import defaultdict
from itertools import product, groupby
from operator import itemgetter

wb = load_workbook(filename="test.xlsx")

sheet = wb["Sheet1"]

green_rows = defaultdict(list)
rest_data = []

for row in sheet:
    for cell in row:
        look for the green rows; they contain the headers
        if cell.fill.fgColor.rgb == "FFA2D722":
            # take advantage of the fact that header 
            # is the first entry in that row
            if cell.value:
                val = cell.value
            green_rows[(val, cell.row)].append(cell.column)
        else:
            if cell.value not in (None, ""): # so the 0s are not lost
                rest_data.append((cell.row, cell.column, cell.value))

# get the max and minimum column positions
# note the addition of 1 to the max, 
# this is necessary when iterating to sort the data
# in the next section
green_rows = [
    (name, row, range(min(value), max(value) + 1))
    for (name, row), value in green_rows.items()
]


box = []

# here the green rows and the rest of the data
# are combined, then filtered for the respective 
# sections
combo = product(green_rows, rest_data)
for (header, header_row, header_column_range), (
    cell_row,
    cell_column,
    cell_value,
) in combo:
    # this is where the filtration occurs
    if (header_row < cell_row) and (cell_column in header_column_range):
        box.append((header, cell_row, cell_column, cell_value))

final = defaultdict(list)
content = groupby(box, itemgetter(1, 0))

# another iteration to get the final result
for key, value in content:
    final[key[-1]].append([val[-1] for val in value])

You can create your dataframe for each of the headers:

pd.DataFrame(final["Address Association"])


0   1   2   3   4   5
0   Column Name in DB   Name    Description SortOrder   BusinessMeaningName Obsolete
1   Field Type  nvarchar(100)   nvarchar(255)   int nvarchar(50)    bit
2   Mandatory   Yes Yes Yes No  Yes
3   Foreign Key -   -   -   -   -
4   Optional Feature    -   -   -   -   -
5   Field Name in U4SM  Name    Description Sort Order  Business Meaning Name   Obsolete
6   Address.Primary Primary Use this address by default.    1   Address.Primary 0
7   Address.Billing Billing address for billing.    2   Address.Billing 0
8   Address.Emergency   Emergency   use this for emergency. 3   Address.Emergency   0
9   Address.Emergency SMS   Emergency SMS   use this for emergency SMS. 4   Address.Emergency SMS   0
10  Address.Deceased    Deceased    address for deceased.   5   Address.Deceased    0
11  Address.Home    Home    address for home.   8   Address.Home    0
12  Address.Mailing Mailing address for mailing.    9   Address.Mailing 0
13  Address.Mobile  Mobile  use this for mobile.    10  Address.Mobile  0
14  Address.School  School  address for school. 13  Address.School  0
15  Address.SMS SMS use this for SMS text.  15  Address.SMS 0
16  Address.Work    Work    address for work    16  Address.Work    0
17  Address.Permanent   Permanent   Permanent Address   17  Address.Permanent   0
18  Address.HallsOfResidence    Halls of Residence  Halls of Residence  18  Address.HallsOfResidence    0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.