Reading an excel file using Python
Python provides several modules to interact with Excel files. These libraries make it easy to read, write, and modify Excel data depending on your needs.
The most commonly used libraries are:
- pandas: fast, high-level data analysis
- openpyxl: fine-grained control of .xlsx files
- xlwings: connects directly with Excel for automation
Installation
Install the required libraries using the following command:
pip install pandas openpyxl xlwings
Let's look at some examples of working with each library. Below is an image of the sample Excel data we are going to use in this article, click here to download it.
1. Using pandas
pandas is the most popular library for data analysis in Python. It can quickly load Excel files into a DataFrame, making it easy to explore and manipulate tabular data.
import pandas as pd
df = pd.read_excel('student_data.xlsx')
print(df)
Output:
Explanation:
- import pandas as pd: imports the pandas library.
- pd.read_excel("student_data.xlsx"): reads Excel file named student_data.xlsx into a DataFrame.
- df.head(): shows first 5 rows of the data.
2. Using openpyxl
openpyxl provides low-level access to excel files. It’s useful when you need to work with individual cells, rows, columns, formulas, or formatting.
import openpyxl
df = openpyxl.load_workbook("student_data.xlsx")
df1 = df.active
for row in range(0, df1.max_row):
for col in df1.iter_cols(1, df1.max_column):
print(col[row].value)
Output:
Explanation:
- Loads student_data.xlsx with openpyxl.load_workbook.
- Selects the active sheet with dataframe.active.
- Loops through all rows (range(0, max_row)) and all columns (iter_cols).
- col[row].value: gets the value of each cell.
3. Using xlwings
xlwings connects directly with Microsoft Excel. Unlike pandas or openpyxl, it opens Excel in the background and allows automation such as formatting, formulas, and charts.
import xlwings as xw
# Specifying a sheet
ws = xw.Book("student_data.xlsx").sheets['Sheet1']
v1 = ws.range("B1:B7).value
print("Result:", v1)
Output:
Names: ['Ankit', 'Rahul', 'Shaurya', 'Aishwarya', 'Priyanka']Explanation:
- ws.range("B1:B7").value: selects all cells from B1 to B7 in column Name.
- .value: returns these cells as a Python list.
- The first element 'Name' is the header from cell B1, followed by all student names.