Python: apply regex on dataframe with datetime as a column

Question

I have a dataframe (df)as follows

Index     Month     Time       Text_1          Text_2                 Text_3
  0      02/2019   19:44:33   aadd@34:9984    (none)\       62fa6297-f5f5-4c47-8236-4a85cad5e601
                                             STBROWN2-M-26YQ
  1      02/2019   19:30:22   58:EF:68:14    (none)\        f933fb2a-4dde-a547-80ca-3b9e6cd29a6d
                                             STBROWN2-M-26YQ

I have written a simple regex as follows

def clean(text):
text = text.lower()
text_clean = re.sub('[^A-Za-z0-9]', ' ', text)
return text_clean

Then I apply the above on the df

df.apply(lambda x : clean(x))

I am getting the following error:

AttributeError: ("'Series' object has no attribute 'lower'", 'occurred at index Application')

It could be because of Month and Time column as they are datetime object.

My question is: How to apply a regex while ignoring the datetimes?

Dint work the above one. TypeError: ('expected string or bytes-like object', 'occurred at index Application') — pythondumb
– pythondumb, Commented Feb 28, 2019 at 6:01

Vaishali · Accepted Answer · 2019-02-28 06:02:56Z

2

Use filter to select columns starting with Text

def clean(text):
    text = text.str.lower()
    text_clean = text.str.replace('[^A-Za-z0-9]', ' ', regex = True)
    return text_clean
df.assign(**df.filter(like = 'Text').apply(clean))

edited Feb 28, 2019 at 6:02

answered Feb 28, 2019 at 6:01

Vaishali

38.5k5 gold badges62 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

pythondumb Over a year ago

This solution is not working per say. Even if you apply the clean on the sample df as I shared, it would not do the needful.

Vaishali Over a year ago

@pythondumb, I have used your code with modifications so that it does not throw Attributeerror. If you want the code to perform differently, you need to provide expected output.

pythondumb Over a year ago

Will try to repost this with more enriched df values.

jezrael · Accepted Answer · 2019-02-28 06:07:59Z

1

In your data I think all data are strings, but if want exclude datetimes columns use select_dtypes:

def clean(text):
    return text.str.lower().str.replace('[^A-Za-z0-9]', '')

#filter only object columns
mask = df.dtypes == 'object'
#filter Text columns if possible
#mask = df.columns.startswith('Text')

df.loc[:, mask] = df.loc[:, mask].apply(clean)

edited Feb 28, 2019 at 6:07

answered Feb 28, 2019 at 5:58

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Python: apply regex on dataframe with datetime as a column

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related