how to read certain columns from Excel using Pandas - Python

Question

I am reading from an Excel sheet and I want to read certain columns: column 0 because it is the row-index, and columns 22:37. Now here is what I do:

import pandas as pd
import numpy as np
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = 37)
df= pd.concat([df[df.columns[0]], df[df.columns[22:]]], axis=1)

But I would hope there is better way to do that! I know if I do parse_cols=[0, 22,..,37] I can do it, but for large datasets this doesn't make sense.

I also did this:

s = pd.Series(0)
s[1]=22
for i in range(2,14):
    s[i]=s[i-1]+1
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = s)

But it reads the first 15 columns which is the length of s.

you'd have to generate a list of cols and pass this to parse_cols e.g. parse_cols=[0, 22,23,24.....,37] rather than what you're doing now — EdChum
– EdChum, Commented Nov 11, 2015 at 16:30
Not sure why that didn't work, it could be a bug, what happens when you pass a hard coded list: df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = [0,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37)? — EdChum
– EdChum, Commented Nov 13, 2015 at 9:31

tdy · Accepted Answer · 2021-12-05 07:39:01Z

101

You can use column indices (letters) like this:

import pandas as pd
import numpy as np
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], usecols="A,C:AA")
print(df)

Corresponding documentation:

usecols : int, str, list-like, or callable default None

If None, then parse all columns.

If str, then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides.

If list of int, then indicates list of column numbers to be parsed.

If list of string, then indicates list of column names to be parsed.

New in version 0.24.0.

If callable, then evaluate each column name against it and parse the column if the callable returns True.

Returns a subset of the columns according to behavior above.

New in version 0.24.0.

edited Dec 5, 2021 at 7:39

tdy

42k42 gold badges124 silver badges125 bronze badges

answered Nov 14, 2015 at 14:40

MartyIX

28.8k33 gold badges143 silver badges218 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ando Jurai Over a year ago

It should be noted that "names" should be read as "names in excel", not those you could choose or use as headers. The docs are not clear about this but it is worth mentionning, it gave me some headaches.

Uday Kiran · Accepted Answer · 2021-07-28 20:01:20Z

23

"usecols" should help, use range of columns (as per excel worksheet, A,B...etc.) below are the examples

1. Selected Columns

df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A,C,F")

2. Range of Columns and selected column

df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:F,H")

3. Multiple Ranges

df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:F,H,J:N")

4. Range of columns

df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:N")

edited Jul 28, 2021 at 20:01

answered Apr 5, 2020 at 9:46

Uday Kiran

7298 silver badges9 bronze badges

2 Comments

rluts Over a year ago

any ideas for limiting columns by number?

Uday Kiran Over a year ago

@rluts, replace usecols="A,C,F", with usecols=[0,2,5], incase of range of column numbers, usecols=range(2,9), depending on the requirement please replace the numbers.

Georgy · Accepted Answer · 2019-02-15 09:43:58Z

22

parse_cols is deprecated, use usecols instead

that is:

df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], usecols = "A,C:AA")

edited Feb 15, 2019 at 9:43

Georgy

14k7 gold badges69 silver badges79 bronze badges

answered Mar 23, 2018 at 4:57

Leoli

7492 gold badges9 silver badges18 bronze badges

1 Comment

Evan Over a year ago

Note also this bug/unexpected behavior, which I ran into today. github.com/pandas-dev/pandas/issues/18273 Looks like using column names does not work with Excel...

user2557522 · Accepted Answer · 2022-06-23 20:28:12Z

11

If you know the names of the columns and do not want to use A,B,D or 0,4,7. This actually works

df = pd.read_excel(url)[['name of column','name of column','name of column','name of column','name of column']]

where "name of column" = columns wanted. Case and whitespace sensitive

answered Jun 23, 2022 at 20:28

user2557522

1691 silver badge7 bronze badges

Comments

Mounesh · Accepted Answer · 2022-09-11 12:57:28Z

2

Read any column's data in excel

import pandas as pd


name_of_file =  "test.xlsx"
data = pd.read_excel(name_of_file)

required_colum_name = "Post test Number"
print(data[required_colum_name])

answered Sep 11, 2022 at 12:57

Mounesh

8428 silver badges21 bronze badges

Comments

StephanT · Accepted Answer · 2022-12-05 16:15:53Z

0

Unfortunately these methods still seem to read and convert the headers before returning the subselection. I have an Excel sheet with duplicate header names because the sheet contains several similar tables. I want to read those tables individually, so I would want to apply usecols. However, this still add suffixes to the duplicate column names.

To reproduce:

create an Excel sheet with headers named Header1, Header2, Header1, Header2 under columns A, B, C, D
df.read_excel(filename, usecols='C:D')

df.columns will return ['Header1.1', 'Header2.1']

Is there way to circumvent this, aside from splitting and joining the resulting headers? Especially when it is unknown whether there are duplicate columns it is tricky to rename them, as splitting on '.' may be corrupting a non-duplicate header.

Edit: additionally, the length (in indeces) of a DataFrame based on a subset of columns will be determined by the length of the full file. So if column A has 10 rows, and column B only has 5, a DataFrame generated by usecols='B' will have 10 rows of which 5 filled with NaN's.

edited Dec 5, 2022 at 16:15

answered Dec 5, 2022 at 15:18

StephanT

8336 silver badges12 bronze badges

1 Comment

KarlsMaranjs Over a year ago

This should be posted as a separate question

Collectives™ on Stack Overflow

how to read certain columns from Excel using Pandas - Python

6 Answers 6

1 Comment

2 Comments

1 Comment

Comments

Read any column's data in excel

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

2 Comments

1 Comment

Comments

Read any column's data in excel

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related