1

I have a dataframe (df)as follows

Index     Month     Time       Text_1          Text_2                 Text_3
  0      02/2019   19:44:33   aadd@34:9984    (none)\       62fa6297-f5f5-4c47-8236-4a85cad5e601
                                             STBROWN2-M-26YQ
  1      02/2019   19:30:22   58:EF:68:14    (none)\        f933fb2a-4dde-a547-80ca-3b9e6cd29a6d
                                             STBROWN2-M-26YQ

I have written a simple regex as follows

def clean(text):
text = text.lower()
text_clean = re.sub('[^A-Za-z0-9]', ' ', text)
return text_clean

Then I apply the above on the df

df.apply(lambda x : clean(x))

I am getting the following error:

AttributeError: ("'Series' object has no attribute 'lower'", 'occurred at index Application')

It could be because of Month and Time column as they are datetime object.

My question is: How to apply a regex while ignoring the datetimes?

1
  • Dint work the above one. TypeError: ('expected string or bytes-like object', 'occurred at index Application') Commented Feb 28, 2019 at 6:01

2 Answers 2

2

Use filter to select columns starting with Text

def clean(text):
    text = text.str.lower()
    text_clean = text.str.replace('[^A-Za-z0-9]', ' ', regex = True)
    return text_clean
df.assign(**df.filter(like = 'Text').apply(clean))
Sign up to request clarification or add additional context in comments.

3 Comments

This solution is not working per say. Even if you apply the clean on the sample df as I shared, it would not do the needful.
@pythondumb, I have used your code with modifications so that it does not throw Attributeerror. If you want the code to perform differently, you need to provide expected output.
Will try to repost this with more enriched df values.
1

In your data I think all data are strings, but if want exclude datetimes columns use select_dtypes:

def clean(text):
    return text.str.lower().str.replace('[^A-Za-z0-9]', '')

#filter only object columns
mask = df.dtypes == 'object'
#filter Text columns if possible
#mask = df.columns.startswith('Text')

df.loc[:, mask] = df.loc[:, mask].apply(clean)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.