how to extract different tables in excel sheet using python

Question

In one excel file, sheet 1 , there are 4 tables at different locations in the sheet .How to read those 4 tables . for example I have even added one picture snap from google for reference. without using indexes is there any other way to extract tables.

1. why don't you want to use indices? 2. can you describe the constraints of the table position, size, etc. 3. can you provide an example? — mozway
– mozway, Commented Sep 20, 2021 at 14:12
I don't want to use excel indexes , as there are many more sheets with different structure of excel sheet and formats . obviously we can use data frame indexes , sorry if my question got you confused between excel sheet index and data frame index. — codette
– codette, Commented Sep 20, 2021 at 14:18
What information you have to confirm that there are n number of tables, apart from virtually seeing it? is it confirm that you have atleast one row blank between two tables? — shivankgtm
– shivankgtm, Commented Sep 25, 2021 at 13:02

sammywemmy · Accepted Answer · 2023-08-22 13:17:30Z

I assume your tables are formatted as "Excel Tables". You can create an excel table by mark a range and then click:

Then there are a good guide from Samuel Oranyeli how to import the Excel Tables with Python. I have used his code and show with examples.

I have used the following data in excel, where each color represents a table.

Remarks about code:

The following part can be used to check which tables exist in the worksheet that we are working with:

# check what tables that exist in the worksheet
print({key : value for key, value in ws.tables.items()})

In our example this code will give:

{'Table2': 'A1:C18', 'Table3': 'D1:F18', 'Table4': 'G1:I18', 'Table5': 'J1:K18'}

Here you set the dataframe names. Be cautious if the number of dataframes missmatches the number of tables you will get an error.

# Extract all the tables to individually dataframes from the dictionary
Table2, Table3, Table4, Table5 = mapping.values()

# Print each dataframe
print(Table2.head(3)) # Print first 3 rows from df

print(Table2.head(3)) gives:

   Index   first_name   last_name     address
   0          Aleshia   Tomkiewicz    14 Taylor St
   1             Evan   Zigomalas     5 Binney St
   2           France   Andrade       8 Moor Place

Full code:

#import libraries
from openpyxl import load_workbook
import pandas as pd

# read file
wb = load_workbook("G:/Till/Tables.xlsx") # Set the filepath + filename

# select the sheet where tables are located
ws = wb["Tables"]

# check what tables that exist in the worksheet
print({key : value for key, value in ws.tables.items()})

mapping = {}

# loop through all the tables and add to a dictionary
for entry, data_boundary in ws.tables.items():
    # parse the data within the ref boundary
    data = ws[data_boundary]
    
    ### extract the data ###
    # the inner list comprehension gets the values for each cell in the table
    content = [[cell.value for cell in ent] 
               for ent in data]
    
    header = content[0]
    
    #the contents ... excluding the header
    rest = content[1:]
    
    #create dataframe with the column names
    #and pair table name with dataframe
    df = pd.DataFrame(rest, columns = header)
    mapping[entry] = df

#  print(mapping)

# Extract all the tables to individually dataframes from the dictionary
Table2, Table3, Table4, Table5 = mapping.values()

# Print each dataframe
print(Table2)
print(Table3)
print(Table4)
print(Table5)

Example data, example file:

first_name	last_name	address	city	county	postal
Aleshia	Tomkiewicz	14 Taylor St	St. Stephens Ward	Kent	CT2 7PP
Evan	Zigomalas	5 Binney St	Abbey Ward	Buckinghamshire	HP11 2AX
France	Andrade	8 Moor Place	East Southbourne and Tuckton W	Bournemouth	BH6 3BE
Ulysses	Mcwalters	505 Exeter Rd	Hawerby cum Beesby	Lincolnshire	DN36 5RP
Tyisha	Veness	5396 Forth Street	Greets Green and Lyng Ward	West Midlands	B70 9DT
Eric	Rampy	9472 Lind St	Desborough	Northamptonshire	NN14 2GH
Marg	Grasmick	7457 Cowl St #70	Bargate Ward	Southampton	SO14 3TY
Laquita	Hisaw	20 Gloucester Pl #96	Chirton Ward	Tyne & Wear	NE29 7AD
Lura	Manzella	929 Augustine St	Staple Hill Ward	South Gloucestershire	BS16 4LL
Yuette	Klapec	45 Bradfield St #166	Parwich	Derbyshire	DE6 1QN
Fernanda	Writer	620 Northampton St	Wilmington	Kent	DA2 7PP
Charlesetta	Erm	5 Hygeia St	Loundsley Green Ward	Derbyshire	S40 4LY
Corrinne	Jaret	2150 Morley St	Dee Ward	Dumfries and Galloway	DG8 7DE
Niesha	Bruch	24 Bolton St	Broxburn, Uphall and Winchburg	West Lothian	EH52 5TL
Rueben	Gastellum	4 Forrest St	Weston-Super-Mare	North Somerset	BS23 3HG
Michell	Throssell	89 Noon St	Carbrooke	Norfolk	IP25 6JQ
Edgar	Kanne	99 Guthrie St	New Milton	Hampshire	BH25 5DF

Ama · Accepted Answer · 2021-09-25 16:37:47Z

0

You need two things:

Access OpenXML data via python: https://github.com/python-openxml/python-xlsx
Find the tables in the file, via what is called a DefinedName: https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.definedname?view=openxml-2.8.1

answered Sep 25, 2021 at 16:37

Ama

1,59515 silver badges28 bronze badges

Comments

Mohammad Farseen · Accepted Answer · 2021-09-25 13:02:28Z

-2

You may convert your excel sheet to csv file and then use csv module to grab rows.

    import pandas as pd
    read_file  = pd.read_excel("Test.xlsx")
    read_file.to_csv ("Test.csv",index = None,header=True) 
    enter code here
    df = pd.DataFrame(pd.read_csv("Test.csv")) 
    print(df)

For better approch please provide us sample excel file

answered Sep 25, 2021 at 13:02

Mohammad Farseen

1131 gold badge2 silver badges9 bronze badges

Collectives™ on Stack Overflow

how to extract different tables in excel sheet using python

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related