How can I specify column names while reading an Excel file using Pandas?

Question

I read an Excel sheet into a Pandas DataFrame this way:

import pandas as pd

xl = pd.ExcelFile("Path + filename")
df = xl.parse("Sheet1")

The first cell's value of each column is selected as the column name for the dataFrame, and I want to specify my own column names. How do I do this?

Peter Mortensen · Accepted Answer · 2024-05-21 13:25:09Z

22

Note: Pandas now (v0.22) has a keyword to specify column names at parsing Excel files. Use:

import pandas as pd
xl = pd.ExcelFile("Path + filename")
df = xl.parse("Sheet 1", header=None, names=['A', 'B', 'C'])

If header=None is not set, pd seems to consider the first row as the header and delete it during parsing. If there is indeed a header, but you don’t want to use it, you have two choices. Either (1) use "names" kwarg only; or (2) use "names" with header=None and skiprows=1.

I personally prefer the second option, since it clearly makes note that the input file is not in the format I want, and that I am doing something to go around it.

edited May 21, 2024 at 13:25

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Apr 20, 2018 at 14:21

ram

3034 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Roland Kwee Over a year ago

Thanks, also to other answerers, for adding the additional notes on skipping the header row, which are not part of the question of OP, but essential for beginners on reading excel in python like us who got here at this question

Peter Mortensen · Accepted Answer · 2024-05-21 13:27:10Z

19

As Ram said, this post comes on the top and may be useful to some....

In pandas 0.24.2 (may be earlier as well), read_excel itself has the capability of ignoring the source headers and giving your own column names and few other good controls:

DID = pd.read_excel(file1, sheet_name=0, header=None, usecols=[0, 1, 6], names=['A', 'ID', 'B'], dtype={2:str}, skiprows=10)

# For example...
# usecols => read only specific col indexes
# dtype => specifying the data types
# skiprows => skip number of rows from the top.

edited May 21, 2024 at 13:27

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Apr 11, 2019 at 0:40

Loku

2662 silver badges5 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2024-05-21 13:22:50Z

14

I think setting them afterwards is the only way in this case, so if you have, for example, four columns in your DataFrame:

df.columns = ['W', 'X', 'Y', 'Z']

If you know in advance what the headers in the Excel file are, it’s probably better to rename them. This would rename W into A, etc.:

df.rename(columns={'W':'A', 'X':'B', etc.})

edited May 21, 2024 at 13:22

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Jun 27, 2013 at 6:12

Rutger Kassies

65k17 gold badges119 silver badges102 bronze badges

1 Comment

Rakesh Adhikesavan Over a year ago

my problem is that the first row of the Excel file contains valid data and not the column names. so using "df.columns = ['W','X','Y','Z']" I would lose data ... so I need to append the col names on top of existing data then change the col names ....

Peter Mortensen · Accepted Answer · 2024-05-21 13:24:00Z

10

Call .parse with the header=None keyword argument.

df = xl.parse("Sheet1", header=None)

edited May 21, 2024 at 13:24

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Jun 27, 2013 at 6:25

falsetru

371k69 gold badges769 silver badges659 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2024-05-21 13:28:16Z

8

In case the Excel sheet only contains the data without headers:

df = pd.read_excel("the excel file", header=None, names=["A","B","C"])

In case the Excel sheet already contains header names, then use skiprows to skip the line:

df = pd.read_excel("the excel file", header=None, names=["A","B","C"], skiprows=1)

edited May 21, 2024 at 13:28

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Aug 4, 2020 at 6:11

code-freeze

4958 silver badges9 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2024-05-21 13:29:21Z

0

When you don't know the number of columns in Excel beforehand and to use the same column names that excel uses: A, B, etc., you can use this option. Inspired from this answer:

import string

df = pd.read_excel(wb_path, header=None)

def get_excel_col_name(col: int):
    result = []
    while col:
        col, rem = divmod(col-1, 26)
        result[:0] = string.ascii_uppercase[rem]
    return ''.join(result)

df.columns = [get_excel_col_name(x) for x in range(1, len(df.columns)+1)]

edited May 21, 2024 at 13:29

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Jan 31, 2024 at 19:34

Bharath Gade

1631 silver badge12 bronze badges

Comments

Derek Powles · Accepted Answer · 2025-11-16 09:15:47Z

-1

A tip - use print(df.columns) before trying df.usecols. This shows the headers, more importantly it shows any leading spaces which may not be obvious.

answered Nov 16 at 9:15

Derek Powles

1

New contributor

Collectives™ on Stack Overflow

How can I specify column names while reading an Excel file using Pandas?

7 Answers 7

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related