4

I'm reading .txt files in a directory and want to drop columns that contains some certain string.

for file in glob.iglob(files + '.txt', recursive=True):
    
    cols = list(pd.read_csv(file, nrows =1))
    
    df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols =[i for i in cols if i.str.contains['TRIVIAL|EASY']==False])

when I do this I'm getting

df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols =[i for i >in cols if i.str.contains['PASS']==True])

AttributeError: 'str' object has no attribute 'str'

Which part I need tp fix I could not figured it out ?

select columns based on columns names containing a specific string in pandas

drop column based on a string condition

AttributeError: 'str' object has no attribute 'str'

Drop multiple columns that end with certain string in Pandas

1
  • 1
    1. Never, ever, ever, ever use if something==False, read pep8. 2. The trouble with your code is that cols is already a list of strings, each element is a string, so a string does not have a string method. Change i.str.contains['TRIVIAL|EASY'] to i not in 'TRIVIAL|EASY' Commented Feb 7, 2020 at 15:44

2 Answers 2

5

Without reading the header separately you would pass a callable to usecols. Check whether 'EASY' or 'TRIVIAL' are not in the column name.

exclu = ['EASY', 'TRIVIAL']  # Any substring in this list excludes a column 
usecols = lambda x: not any(substr in x for substr in exclu)

df = pd.read_csv('test.csv', usecols=usecols)

print(df)
   HARD  MEDIUM
0     2       4
1     6       8
2     1       1

Sample Data: test.csv

TRIVIAL,HARD,EASYfoo,MEDIUM
1,2,3,4
5,6,7,8
1,1,1,1
Sign up to request clarification or add additional context in comments.

5 Comments

I want to drop them!
Thanks for your answer too!
@Alexander did this answer work ? if so this should be ideal answer as it doesn't rely on any other library.
@ALozz The only diffuculty with your code is that tracking col names if there are many to drop. This syntax is little bit limited for that case I think Lambda x: not ('EASY' in x or 'TRIVIAL' in x)
@Alexander I updated it. Now you just need to provide a list of the substrings you check for and the list comprehension creates the logic. You're going to need to type them out anyway in the regex so now they should be comparable.
4

few issues in your code, first you are using str.contains on the whole dataframe not the columns, secondly the str contains cannot be used on a list.

using regex

import re

cols = pd.read_csv(file, nrows =1)

cols_to_use = [i for i in cols.columns if not re.search('TRIVIAL|EASY',i)] 


df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols  =cols_to_use)

6 Comments

I got the error saying cols_to_use = [i for i in cols.columns if not re.match('PASS')] AttributeError: 'list' object has no attribute 'columns'
@Alexander remove list from your call sorry missed that - will update answer
No problem. Also, I think we need to use something like re.findall('TRIVIAL|EASY') to drop columns contains those strings ?
@Alexander re.search sorry terrible answer on my side, didn't test it all.
ok I thin I solved the issue. It should be cols_to_use = [i for i in cols if not re.search('PASS',i)] :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.