1

I have a dataframe:

Product Storage Price
Azure (2.4% Server £540
AWS Server £640
GCP Server £540

I would like to remove the column which contains the string '(2.4%' however I only want to remove the column in Pandas through regex if regex finds either a bracket or percentage in the string in that column '(%' and then pandas should drop that column entirely.

Please can you help me find a way to use regex to search for special characters within a string and drop the column if that condition is met?

I've searched on stack/google. I've used the following so far:

df = df.drop([col for col in df.columns if df[col].eq('(%').any()], axis=1)

chars = '(%'
regex = f'[{"".join(map(re.escape, chars))}]'

df = df.loc[:, ~df.apply(lambda c: c.str.contains(regex).any())]

however neither of these worked.

Any help would be greatly appreciated. :)

Thank You * Insert Smiley*

1

3 Answers 3

1

you re using eq function it check exactly if the value in the columun match % instead of eq do this

df.drop([col for col in df.columns if df[col].apply(lambda x:'(%' in str(x)).any()], axis=1,inplace=True)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank You for your response, this worked out of the box and was the neatest solution. :)
'(%' in '(2.4%' evaluates to False, so this won't drop the column
1

I would do something like this

import pandas as pd
from io import StringIO

text = """
Product,Perc,Storage,Price
Azure,(2.4%,Server,£540
AWS,,Server,£640
GCP,,Server,£540
"""
data = pd.read_csv(StringIO(text))
print(data)

drop_columns = list()
for col_name in data.columns:
    has_special_characters = data[col_name].str.contains("[\(%]")
    if has_special_characters.any():
        drop_columns.append(col_name)

print(f"Dropping {drop_columns}")
data.drop(drop_columns, axis=1, inplace=True)
print(data)

Output of the script is:

  Product   Perc Storage Price
0   Azure  (2.4%  Server  £540
1     AWS    NaN  Server  £640
2     GCP    NaN  Server  £540
Dropping ['Perc']
  Product Storage Price
0   Azure  Server  £540
1     AWS  Server  £640
2     GCP  Server  £540

Process finished with exit code 0

5 Comments

Thank you for your response and for taking the time to help me find a solution! :)
No problem. Btw: the answer by to_data is not correct. He should have used a regular expression, his expression evaluates to False. I tried it and the column with the (2.4% is not dropped. For the rest it is identical as my answer, except he uses a one-liner, which generally leads to code that is harder to read.
And I dont want to be a spoiler, but the answer by Scaro0974 is also not correct in case you have None values in your data frame (as is the case if you read the data from a csv file using read_csv). The None values will raise a TypeError in the regular expression. Moreover, iterating over the elements of a column is generally much slower than the inbuilt string evaluation on the whole column, as I did. So I come to the conclusion that I have given the only correct answer :-)
Hi Eelco, the only problem with this solution is I have a predefined list with many elements some 2,000+ and when I create a dataFrame the column holding the 2.4% doesn't have a column header. So I can't predefine a name for that column, unless I create a positional argument first in Pandas to change the name to Perc.
I have given the name 'Perc' in the header of the 'csv file', I don't use it in the script, because there it gets assigned to the variable 'col_name'. In case I would have used an empty field in the csv file, the read_csv file would have automatically assigned a name to it as 'Unnamed: 1'. If you have an empty string as column name, you can still use this script. You only have to be careful that you don'nt have mulitple columns with the same name. It is better to assign names to all your columns
1

You can try this (I guess the name of the column you want to drop is ""):

import re

change_col = False
for elem in df[""]:
    if re.search(r'[(%]', elem):
        change_col = True

if change_col:
    df = df.drop("", axis=1)

2 Comments

Thank you for your response and for taking the time to help me find a solution, this worked and was the second neatest solution! :)
The empty elements in your column will raise a TypeError, you cannot assume the empty values are empty strings (normally these are represented by nan values)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.