Pandas read_csv skiprows with conditional statements

Question

I have a bunch of txt files that i need to compile into a single master file. I use read_csv to extract the information inside. There are some rows to drop, and i was wondering if it's possible to use the skiprows feature without specifying the index number of rows that i want to drop, but rather to tell which one to drop according to its row content/value. Here's how the data looks like to illustrate my point.

Index     Column 1          Column 2
0         Rows to drop      Rows to drop
1         Rows to drop      Rows to drop
2         Rows to drop      Rows to drop
3         Rows to keep      Rows to keep
4         Rows to keep      Rows to keep
5         Rows to keep      Rows to keep
6         Rows to keep      Rows to keep
7         Rows to drop      Rows to drop
8         Rows to drop      Rows to drop
9         Rows to keep      Rows to keep
10        Rows to drop      Rows to drop
11        Rows to keep      Rows to keep
12        Rows to keep      Rows to keep
13        Rows to drop      Rows to drop
14        Rows to drop      Rows to drop
15        Rows to drop      Rows to drop

What is the most effective way to do this?

Joe Ferndz · Accepted Answer · 2020-09-23 06:05:59Z

4

Is this what you want to achieve:

import pandas as pd
df = pd.DataFrame({'A':['row 1','row 2','drop row','row 4','row 5',
                        'drop row','row 6','row 7','drop row','row 9']})

df1 = df[df['A']!='drop row']

print (df)
print (df1)

Original Dataframe:

          A
0     row 1
1     row 2
2  drop row
3     row 4
4     row 5
5  drop row
6     row 6
7     row 7
8  drop row
9     row 9

New DataFrame with rows dropped:

       A
0  row 1
1  row 2
3  row 4
4  row 5
6  row 6
7  row 7
9  row 9

While you cannot skip rows based on content, you can skip rows based on index. Here are some options for you:

skip n number of row:

df = pd.read_csv('xyz.csv', skiprows=2)
#this will skip 2 rows from the top

skip specific rows:

df = pd.read_csv('xyz.csv', skiprows=[0,2,5])
#this will skip rows 1, 3, and 6 from the top
#remember row 0 is the 1st line

skip nth row in the file

#you can also skip by counts. 
#In below example, skip 0th row and every 5th row from there on

def check_row(a):
    if a % 5 == 0:
        return True
    return False

df = pd.read_csv('xyz.txt', skiprows= lambda x:check_row(x))

More details of this can be found in this link about skip rows

edited Sep 23, 2020 at 6:05

answered Sep 23, 2020 at 5:28

Joe Ferndz

8,5282 gold badges15 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Alv Over a year ago

That's quite similar to what i did, except i threw in some string slicing for the rows that i wanted to drop. But yes that's what i want to achieve, only i was wondering if skiprows could do that though.

Joe Ferndz Over a year ago

You can skip specific indexes like this usersDf = pd.read_csv('users.csv', skiprows=[0,2,5]). In this case, it will skip rows 1, 3, and 6. Remember 0 represents 1st row. So you have to be very specific on which rows to skip

MoG · Accepted Answer · 2020-09-23 05:39:46Z

1

No. skiprows will not allow you to drop based on the row content/value.

Based on Pandas Documentation:

skiprows : list-like, int or callable, optional
Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

edited Sep 23, 2020 at 5:39

answered Sep 23, 2020 at 5:18

MoG

4141 gold badge6 silver badges16 bronze badges

1 Comment

Alv Over a year ago

I see. even with lambda, it still looks through indices? is it correct?

Sadiq Raza · Accepted Answer · 2020-09-23 05:26:54Z

1

Since you cannot do that using skiprows, I could think of this way as efficient :

df = pd.read_csv(filePath)

df = df.loc[df['column1']=="Rows to keep"]

answered Sep 23, 2020 at 5:26

Sadiq Raza

3541 gold badge3 silver badges11 bronze badges

2 Comments

Alv Over a year ago

does loc return the index of that row?

Sadiq Raza Over a year ago

@Alv It will not return the index, but the whole dataframe based on the condition inside. .loc is a property of dataframe through which you can access rows, index wise(location wise) based on filter condition . Read this for details.

DryLabRebel · Accepted Answer · 2025-11-17 03:43:10Z

-1

Not a python solution but, this would be absurdly simple to achieve using grep/bash.

printf "Index\tColumn\s1\tColumn\s2\n" > master_file
    for j in *.txt #bunch of text files
      do
        grep -v "Rows to drop" < "$j" >> master_file
    done

Depending on how many files you have and how large, this take a while, but you don't have to read the files into memory first, which presumably is the main reason you want to cherry pick your rows in the first place.

NOTE: You can use subprocess if you want the operation embedded into a python workflow - see here.

edited Nov 17 at 3:43

answered Nov 17 at 3:38

DryLabRebel

10.6k3 gold badges21 silver badges26 bronze badges

1 Comment

DryLabRebel 2 hours ago

Would be helpful for me to improve my answers in future if I knew why I was downvoted. Just to add more context to this. Given this operation is also trivial in python, if the OP is showing an interest in using skiprows - it's likely that reading files into memory is an issue. Using grep to solve this problem is extremely memory efficient and solves the problem without reading files into memory - which it very much looks like the OP wants to do. So while not a python solution per se I still think it achieves the OPs aim. Happy to be corrected.

Collectives™ on Stack Overflow

Pandas read_csv skiprows with conditional statements

4 Answers 4

skip n number of row:

skip specific rows:

skip nth row in the file

2 Comments

1 Comment

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

skip n number of row:

skip specific rows:

skip nth row in the file

2 Comments

1 Comment

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related