In one excel file, sheet 1 , there are 4 tables at different locations in the sheet .How to read those 4 tables . for example I have even added one picture snap from google for reference. without using indexes is there any other way to extract tables.
-
1. why don't you want to use indices? 2. can you describe the constraints of the table position, size, etc. 3. can you provide an example?mozway– mozway2021-09-20 14:12:54 +00:00Commented Sep 20, 2021 at 14:12
-
I don't want to use excel indexes , as there are many more sheets with different structure of excel sheet and formats . obviously we can use data frame indexes , sorry if my question got you confused between excel sheet index and data frame index.codette– codette2021-09-20 14:18:02 +00:00Commented Sep 20, 2021 at 14:18
-
Are these real Excel TableCorralien– Corralien2021-09-25 05:55:25 +00:00Commented Sep 25, 2021 at 5:55
-
What information you have to confirm that there are n number of tables, apart from virtually seeing it? is it confirm that you have atleast one row blank between two tables?shivankgtm– shivankgtm2021-09-25 13:02:34 +00:00Commented Sep 25, 2021 at 13:02
3 Answers
I assume your tables are formatted as "Excel Tables". You can create an excel table by mark a range and then click:
Then there are a good guide from Samuel Oranyeli how to import the Excel Tables with Python. I have used his code and show with examples.
I have used the following data in excel, where each color represents a table.
Remarks about code:
The following part can be used to check which tables exist in the worksheet that we are working with:
# check what tables that exist in the worksheet
print({key : value for key, value in ws.tables.items()})
In our example this code will give:
{'Table2': 'A1:C18', 'Table3': 'D1:F18', 'Table4': 'G1:I18', 'Table5': 'J1:K18'}
Here you set the dataframe names. Be cautious if the number of dataframes missmatches the number of tables you will get an error.
# Extract all the tables to individually dataframes from the dictionary
Table2, Table3, Table4, Table5 = mapping.values()
# Print each dataframe
print(Table2.head(3)) # Print first 3 rows from df
print(Table2.head(3)) gives:
Index first_name last_name address 0 Aleshia Tomkiewicz 14 Taylor St 1 Evan Zigomalas 5 Binney St 2 France Andrade 8 Moor Place
Full code:
#import libraries
from openpyxl import load_workbook
import pandas as pd
# read file
wb = load_workbook("G:/Till/Tables.xlsx") # Set the filepath + filename
# select the sheet where tables are located
ws = wb["Tables"]
# check what tables that exist in the worksheet
print({key : value for key, value in ws.tables.items()})
mapping = {}
# loop through all the tables and add to a dictionary
for entry, data_boundary in ws.tables.items():
# parse the data within the ref boundary
data = ws[data_boundary]
### extract the data ###
# the inner list comprehension gets the values for each cell in the table
content = [[cell.value for cell in ent]
for ent in data]
header = content[0]
#the contents ... excluding the header
rest = content[1:]
#create dataframe with the column names
#and pair table name with dataframe
df = pd.DataFrame(rest, columns = header)
mapping[entry] = df
# print(mapping)
# Extract all the tables to individually dataframes from the dictionary
Table2, Table3, Table4, Table5 = mapping.values()
# Print each dataframe
print(Table2)
print(Table3)
print(Table4)
print(Table5)
Example data, example file:
| first_name | last_name | address | city | county | postal |
|---|---|---|---|---|---|
| Aleshia | Tomkiewicz | 14 Taylor St | St. Stephens Ward | Kent | CT2 7PP |
| Evan | Zigomalas | 5 Binney St | Abbey Ward | Buckinghamshire | HP11 2AX |
| France | Andrade | 8 Moor Place | East Southbourne and Tuckton W | Bournemouth | BH6 3BE |
| Ulysses | Mcwalters | 505 Exeter Rd | Hawerby cum Beesby | Lincolnshire | DN36 5RP |
| Tyisha | Veness | 5396 Forth Street | Greets Green and Lyng Ward | West Midlands | B70 9DT |
| Eric | Rampy | 9472 Lind St | Desborough | Northamptonshire | NN14 2GH |
| Marg | Grasmick | 7457 Cowl St #70 | Bargate Ward | Southampton | SO14 3TY |
| Laquita | Hisaw | 20 Gloucester Pl #96 | Chirton Ward | Tyne & Wear | NE29 7AD |
| Lura | Manzella | 929 Augustine St | Staple Hill Ward | South Gloucestershire | BS16 4LL |
| Yuette | Klapec | 45 Bradfield St #166 | Parwich | Derbyshire | DE6 1QN |
| Fernanda | Writer | 620 Northampton St | Wilmington | Kent | DA2 7PP |
| Charlesetta | Erm | 5 Hygeia St | Loundsley Green Ward | Derbyshire | S40 4LY |
| Corrinne | Jaret | 2150 Morley St | Dee Ward | Dumfries and Galloway | DG8 7DE |
| Niesha | Bruch | 24 Bolton St | Broxburn, Uphall and Winchburg | West Lothian | EH52 5TL |
| Rueben | Gastellum | 4 Forrest St | Weston-Super-Mare | North Somerset | BS23 3HG |
| Michell | Throssell | 89 Noon St | Carbrooke | Norfolk | IP25 6JQ |
| Edgar | Kanne | 99 Guthrie St | New Milton | Hampshire | BH25 5DF |
Comments
You need two things:
- Access OpenXML data via python: https://github.com/python-openxml/python-xlsx
- Find the tables in the file, via what is called a
DefinedName: https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.definedname?view=openxml-2.8.1
Comments
You may convert your excel sheet to csv file and then use csv module to grab rows.
import pandas as pd
read_file = pd.read_excel("Test.xlsx")
read_file.to_csv ("Test.csv",index = None,header=True)
enter code here
df = pd.DataFrame(pd.read_csv("Test.csv"))
print(df)
For better approch please provide us sample excel file


