0

I have a string column that follows a following pattern:

yariyada up to a maximum of (number)% yariyada

For example, like this.

will be granted up to a maximum of 75.5% If less, then nothing

I want to create another column that extracts that number that comes between "up to a maximum of" and "%".

So far I'm only able to detect if the string column contains that pattern, using .contains method.

If this is of any elucidation, in Stata (I'm a stata user), I would use regexm to break the string into parts and use regexs to retreive the parts. I'm wondering if Pandas has a similar, or better!, function.

Thanks for your help!

2 Answers 2

2

You could use pandas.core.strings.StringMethods.extract method to ind groups in each string using passed regular expression

df['col_name'].str.extract('up to a maximum of (.*)%')

Will give you a new column with number extracted

Sign up to request clarification or add additional context in comments.

1 Comment

woah, that was quick and simple. Thank you so much!
0

bigtable

color region finish
red, yellow AK, NV, CA a, b,c
red, blue CA,TX, NV a,c, p
blue, red TX,CA, AK p,a, c
blue, yellow TX,CA, NV p, c, a
yellow, red AK,CA,NV c, b, a
yellow,blue CA,TX, NV c, a, b

    list = list(bigtable)
    for index in range(len(list)):
       bigtable1[list[index]] = bigtable1[list[index]].str.split(',', expand=True).apply(lambda x: pd.Series(np.sort(x)).str.cat(sep=','), axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.