29

I read an Excel sheet into a Pandas DataFrame this way:

import pandas as pd

xl = pd.ExcelFile("Path + filename")
df = xl.parse("Sheet1")

The first cell's value of each column is selected as the column name for the dataFrame, and I want to specify my own column names. How do I do this?

7 Answers 7

22

Note: Pandas now (v0.22) has a keyword to specify column names at parsing Excel files. Use:

import pandas as pd
xl = pd.ExcelFile("Path + filename")
df = xl.parse("Sheet 1", header=None, names=['A', 'B', 'C'])

If header=None is not set, pd seems to consider the first row as the header and delete it during parsing. If there is indeed a header, but you don’t want to use it, you have two choices. Either (1) use "names" kwarg only; or (2) use "names" with header=None and skiprows=1.

I personally prefer the second option, since it clearly makes note that the input file is not in the format I want, and that I am doing something to go around it.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, also to other answerers, for adding the additional notes on skipping the header row, which are not part of the question of OP, but essential for beginners on reading excel in python like us who got here at this question
19

As Ram said, this post comes on the top and may be useful to some....

In pandas 0.24.2 (may be earlier as well), read_excel itself has the capability of ignoring the source headers and giving your own column names and few other good controls:

DID = pd.read_excel(file1, sheet_name=0, header=None, usecols=[0, 1, 6], names=['A', 'ID', 'B'], dtype={2:str}, skiprows=10)

# For example...
# usecols => read only specific col indexes
# dtype => specifying the data types
# skiprows => skip number of rows from the top.

Comments

14

I think setting them afterwards is the only way in this case, so if you have, for example, four columns in your DataFrame:

df.columns = ['W', 'X', 'Y', 'Z']

If you know in advance what the headers in the Excel file are, it’s probably better to rename them. This would rename W into A, etc.:

df.rename(columns={'W':'A', 'X':'B', etc.})

1 Comment

my problem is that the first row of the Excel file contains valid data and not the column names. so using "df.columns = ['W','X','Y','Z']" I would lose data ... so I need to append the col names on top of existing data then change the col names ....
10

Call .parse with the header=None keyword argument.

df = xl.parse("Sheet1", header=None)

Comments

8

In case the Excel sheet only contains the data without headers:

df = pd.read_excel("the excel file", header=None, names=["A","B","C"])

In case the Excel sheet already contains header names, then use skiprows to skip the line:

df = pd.read_excel("the excel file", header=None, names=["A","B","C"], skiprows=1)

Comments

0

When you don't know the number of columns in Excel beforehand and to use the same column names that excel uses: A, B, etc., you can use this option. Inspired from this answer:

import string

df = pd.read_excel(wb_path, header=None)

def get_excel_col_name(col: int):
    result = []
    while col:
        col, rem = divmod(col-1, 26)
        result[:0] = string.ascii_uppercase[rem]
    return ''.join(result)

df.columns = [get_excel_col_name(x) for x in range(1, len(df.columns)+1)]

Comments

-1

A tip - use print(df.columns) before trying df.usecols. This shows the headers, more importantly it shows any leading spaces which may not be obvious.

New contributor
Derek Powles is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.