2

I have a dataframe in the below format and and trying to use the extract function but I keep getting the following error:

ValueError: If using all scalar values, you must pass an index

column1    column2
1         abc2150/abc2152/abc2154/abc215601/U215602


df.column2.str
    .split('/',expand=True)
    .apply(lambda row: row.str.extract('(\d+)', expand=True))
    .apply(lambda x: '/'.join(x.dropna().astype(str)), axis=1)

I need the output in the below format.

column1    column2
1         2150/2152/2154/215601/215602

Please let me know how to fix it.

Thanks

3 Answers 3

2

You could instead use str.replace with a positive lookahead to remove all characters that precede the numerical part:

df.column2.str.replace(r'[a-zA-Z]+(?=\d+)','')

 0    2150/2152/2154/215601/215602
Name: column2, dtype: object
Sign up to request clarification or add additional context in comments.

Comments

0

Why not?

df['column2']=df.column2.str.replace('abc','')

Comments

-1

Here is what I will do:

df.loc[:, "column2"] = df.column2.apply(lambda x: re.sub("[a-zA-Z]+", "", x))

2 Comments

could the down -voter please explain what was wrong with the solution?
Maybe you mixed the slice methods? df["column2"] = df["column2"].apply(lambda x: re.sub("[a-zA-Z]+", "", x)) will not get downvotes?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.