Skip rows during csv import pandas

Question

I'm trying to import a .csv file using pandas.read_csv(), however, I don't want to import the 2nd row of the data file (the row with index = 1 for 0-indexing).

I can't see how not to import it because the arguments used with the command seem ambiguous:

From the pandas website:

skiprows : list-like or integer

Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file."

If I put skiprows=1 in the arguments, how does it know whether to skip the first row or skip the row with index 1?

I would guess that as it states it can be "list-like or integer" and then gives you two options (either skip rows or skip # rows at the start) then if you give it the list [1] it will just skip row 1 (2nd row). If you had given it an integer (for example 10) then it would skip the first 10 rows. — Ffisegydd
– Ffisegydd, Commented Dec 17, 2013 at 15:00
Great that worked. Thanks very much. Just wondered how it would differentiate between the index and int. [] is the answer. — thosphor
– thosphor, Commented Dec 17, 2013 at 15:03

wjandrea · Accepted Answer · 2024-05-09 21:23:55Z

208

You can try yourself:

>>> import pandas as pd
>>> from io import StringIO
>>> s = """1, 2
... 3, 4
... 5, 6"""
>>> pd.read_csv(StringIO(s), skiprows=[1], header=None)
   0  1
0  1  2
1  5  6
>>> pd.read_csv(StringIO(s), skiprows=1, header=None)
   0  1
0  3  4
1  5  6

edited May 9, 2024 at 21:23

wjandrea

33.8k10 gold badges69 silver badges105 bronze badges

answered Dec 17, 2013 at 15:04

alko

48.7k12 gold badges99 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 12:26:32Z

34

I don't have reputation to comment yet, but I want to add to alko answer for further reference.

From the docs:

skiprows: A collection of numbers for rows in the file to skip. Can also be an integer to skip the first n rows

edited May 23, 2017 at 12:26

CommunityBot

11 silver badge

answered May 19, 2014 at 13:35

Hugo

5774 silver badges13 bronze badges

1 Comment

EBo Over a year ago

It has been awhile since I needed to deal with this, but IIRC the name "skiprows" is poorly named because the it is actually skipping lines in the input file. I have a number of cases where there are embedded comments in the CSV files (denoted by a leading #). In my case I want to ignore all comments (and maybe blank lines), then count the rows down into the file. In the past, the skiprow counted all lines in the code (including comments), so it is a skipline, not skiprow. This may have been changed in the last few years since I commented above, but I doubt it.

Community · Accepted Answer · 2020-06-19 05:24:00Z

31

I got the same issue while running the skiprows while reading the csv file. I was doning skip_rows=1 this will not work

Simple example gives an idea how to use skiprows while reading csv file.

import pandas as pd

#skiprows=1 will skip first line and try to read from second line
df = pd.read_csv('my_csv_file.csv', skiprows=1)  ## pandas as pd

#print the data frame
df

edited Jun 19, 2020 at 5:24

CommunityBot

11 silver badge

answered Mar 26, 2019 at 18:11

Viraj Wadate

6,2832 gold badges35 silver badges30 bronze badges

Comments

Mykola Zotko · Accepted Answer · 2021-12-09 06:55:18Z

All of these answers miss one important point -- the n'th line is the n'th line in the file, and not the n'th row in the dataset. I have a situation where I download some antiquated stream gauge data from the USGS. The head of the dataset is commented with '#', the first line after that are the labels, next comes a line that describes the date types, and last the data itself. I never know how many comment lines there are, but I know what the first couple of rows are. Example:

> # ----------------------------- WARNING ----------------------------------
> # Some of the data that you have obtained from this U.S. Geological Survey database
> # may not have received Director's approval. ... agency_cd    site_no datetime    tz_cd   139719_00065    139719_00065_cd
> 5s    15s 20d 6s  14n 10s USGS    08041780    2018-05-06 00:00    CDT 1.98    A

It would be nice if there was a way to automatically skip the n'th row as well as the n'th line.

As a note, I was able to fix my issue with:

import pandas as pd
ds = pd.read_csv(fname, comment='#', sep='\t', header=0, parse_dates=True)
ds.drop(0, inplace=True)

Mykola Zotko · Accepted Answer · 2021-12-09 09:23:36Z

Indices in read_csv refer to line/row numbers in your csv file (the first line has the index 0). You have the following options to skip rows:

from io import StringIO

csv = \
"""col1,col2
1,a
2,b
3,c
4,d
"""
pd.read_csv(StringIO(csv))

# Output:
   col1 col2  # index 0
0     1    a  # index 1
1     2    b  # index 2
2     3    c  # index 3
3     4    d  # index 4

Skip two lines at the start of the file (index 0 and 1). Column names are skipped as well (index 0) and the top line is used for column names. To add column names use names = ['col1', 'col2'] parameter:

pd.read_csv(StringIO(csv), skiprows=2)

# Output:
   2  b
0  3  c
1  4  d

Skip second and fourth lines (index 1 and 3):

pd.read_csv(StringIO(csv), skiprows=[1, 3])

# Output:
   col1 col2
0     2    b
1     4    d

Skip last two lines:

pd.read_csv(StringIO(csv), engine='python', skipfooter=2)

# Output:
   col1 col2
0     1    a
1     2    b

Use a lambda function to skip every second line (index 1 and 3):

pd.read_csv(StringIO(csv), skiprows=lambda x: (x % 2) != 0)

# Output:
   col1 col2
0     2    b
1     4    d

double-beep · Accepted Answer · 2019-05-02 05:31:06Z

-3

skip[1] will skip second line, not the first one.

edited May 2, 2019 at 5:31

double-beep

5,66019 gold badges43 silver badges50 bronze badges

answered May 2, 2019 at 1:40

shanky

7

Collectives™ on Stack Overflow

Skip rows during csv import pandas

6 Answers 6

Comments

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related