10

I have data frame in which txt column contains a list. I want to clean the txt column using function clean_text().

data = {'value':['abc.txt', 'cda.txt'], 'txt':['[''2019/01/31-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart'']',
                                               '[''2019/02/01-11:56:23.288258 1886     7F0ED4CDC704     asfasnfs: remove datepart'']']}
df = pandas.DataFrame(data=data)

def clean_text(text):
    """
    :param text:  it is the plain text
    :return: cleaned text
    """
    patterns = [r"^{53}",
                r"[A-Za-z]+[\d]+[\w]*|[\d]+[A-Za-z]+[\w]*",
                r"[-=/':,?${}\[\]-_()>.~" ";+]"]

    for p in patterns:
        text = re.sub(p, '', text)

    return text

My Solution:

df['txt'] = df['txt'].apply(lambda x: clean_text(x))

But I am getting below error: Error

sre_constants.error: nothing to repeat at position 1
3
  • Possible duplicate of Regex sre_constants.error: bad character range Commented Feb 10, 2019 at 19:59
  • @sophros, this question is different. Commented Feb 10, 2019 at 20:07
  • in what way it is different? The error is the same. Commented Feb 10, 2019 at 20:12

2 Answers 2

10

^{53} is not a valid regular expression, since the repeater {53} must be preceded by a character or a pattern that can be repeated. If you mean to make it validate a string that is at least 53 characters long you can use the following pattern instead:

^.{53}
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for answer. I have updated question, now I get Attribute error.
3

The culprit is the first pattern from the list - r"^{53}". It reads: ^ - match the beginning of the string and then {53} repeat the previous character or group 53 times. Wait... but there is no other character than ^ which cannot be repeated! Indeed. Add a char that you want to match 53 repetitions of. Or, escape the sequence {53} if you want to match it verbatim, e.g. using re.escape.

5 Comments

Thanks for answer. I have updated question, now I get Attribute error.
This should really be another question. How a reader of the question can make any sense of the answers if you change the crucial elements of the question?
And before you do that - please revert the change first so that the answers make sense with the question.
I have post it as different question: stackoverflow.com/questions/54620550/…. Can you please help me in solving.
I have already done that although I believe you should reward the effort already made on answering this question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.