Drop columns contains certain strings while reading data : python

Question

I'm reading .txt files in a directory and want to drop columns that contains some certain string.

for file in glob.iglob(files + '.txt', recursive=True):
    
    cols = list(pd.read_csv(file, nrows =1))
    
    df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols =[i for i in cols if i.str.contains['TRIVIAL|EASY']==False])

when I do this I'm getting

df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols =[i for i >in cols if i.str.contains['PASS']==True])

AttributeError: 'str' object has no attribute 'str'

Which part I need tp fix I could not figured it out ?

select columns based on columns names containing a specific string in pandas

drop column based on a string condition

AttributeError: 'str' object has no attribute 'str'

Drop multiple columns that end with certain string in Pandas

1. Never, ever, ever, ever use if something==False, read pep8. 2. The trouble with your code is that cols is already a list of strings, each element is a string, so a string does not have a string method. Change i.str.contains['TRIVIAL|EASY'] to i not in 'TRIVIAL|EASY' — anishtain4
– anishtain4, Commented Feb 7, 2020 at 15:44

ALollz · Accepted Answer · 2020-02-07 16:39:22Z

5

Without reading the header separately you would pass a callable to usecols. Check whether 'EASY' or 'TRIVIAL' are not in the column name.

exclu = ['EASY', 'TRIVIAL']  # Any substring in this list excludes a column 
usecols = lambda x: not any(substr in x for substr in exclu)

df = pd.read_csv('test.csv', usecols=usecols)

print(df)
   HARD  MEDIUM
0     2       4
1     6       8
2     1       1

Sample Data: `test.csv`

TRIVIAL,HARD,EASYfoo,MEDIUM
1,2,3,4
5,6,7,8
1,1,1,1

edited Feb 7, 2020 at 16:39

answered Feb 7, 2020 at 15:27

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Alexander Over a year ago

I want to drop them!

Alexander Over a year ago

Thanks for your answer too!

Umar.H Over a year ago

@Alexander did this answer work ? if so this should be ideal answer as it doesn't rely on any other library.

Alexander Over a year ago

@ALozz The only diffuculty with your code is that tracking col names if there are many to drop. This syntax is little bit limited for that case I think Lambda x: not ('EASY' in x or 'TRIVIAL' in x)

ALollz Over a year ago

@Alexander I updated it. Now you just need to provide a list of the substrings you check for and the list comprehension creates the logic. You're going to need to type them out anyway in the regex so now they should be comparable.

Umar.H · Accepted Answer · 2020-02-07 16:19:21Z

4

few issues in your code, first you are using str.contains on the whole dataframe not the columns, secondly the str contains cannot be used on a list.

using regex

import re

cols = pd.read_csv(file, nrows =1)

cols_to_use = [i for i in cols.columns if not re.search('TRIVIAL|EASY',i)] 


df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols  =cols_to_use)

edited Feb 7, 2020 at 16:19

answered Feb 7, 2020 at 15:28

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

6 Comments

Alexander Over a year ago

I got the error saying

cols_to_use = [i for i in cols.columns if not re.match('PASS')]  AttributeError: 'list' object has no attribute 'columns'

Umar.H Over a year ago

@Alexander remove list from your call sorry missed that - will update answer

Alexander Over a year ago

No problem. Also, I think we need to use something like re.findall('TRIVIAL|EASY') to drop columns contains those strings ?

Umar.H Over a year ago

@Alexander re.search sorry terrible answer on my side, didn't test it all.

Alexander Over a year ago

ok I thin I solved the issue. It should be cols_to_use = [i for i in cols if not re.search('PASS',i)] :)

|

Collectives™ on Stack Overflow

Drop columns contains certain strings while reading data : python

2 Answers 2

Sample Data: `test.csv`

5 Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Sample Data: test.csv

5 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Sample Data: `test.csv`