2

I am facing a challenge where I am trying to clean a column in my dataset using the regular expression in python. The column is of object type and when I am using the below code I am getting this error: expected string or bytes-like object

import re 
def clean_str(string):
    """
    Tokenization/string cleaning for dataset
    Every dataset is lower cased except
    """
    string = re.sub(r"\n", "", string)    
    string = re.sub(r"\r", "", string) 
    string = re.sub(r"[0-9]", "digit", string)
    string = re.sub(r"\'", "", string)   
    string = re.sub(r"\"", "", string)    
    return string.strip().lower()
X = []
for i in range(df.shape[0]):
    X.append(clean_str(df.iloc[i][1])) #0,1,2,3
y = np.array(df["Standardpositionsname"])
5
  • 2
    Please indent your code correctly. As it stands that code is unreadable. Commented Aug 5, 2019 at 15:26
  • Can you read it now? Commented Aug 5, 2019 at 15:31
  • Oh gosh no. It was better the other way. Is there a line number in the error message? Commented Aug 5, 2019 at 15:32
  • Should I share the traceback call? will that help? Commented Aug 5, 2019 at 15:35
  • 1
    Absolutely. And always. Commented Aug 5, 2019 at 15:36

1 Answer 1

2

I Think in X.append(clean_str(df.iloc[i][1])) you must convert parameter to string type like this

X.append(clean_str(str(df.iloc[i][1])))

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.