0

Assume I have a word AB1234XZY or even 1AB1234XYZ.

I want to extract ONLY 'AB1234' or 1AB1234 (ie. everything up until the letters at the end).

I have used the following code to extract that but it's not working:

base= re.match(r"^(\D+)(\d+)", word).group(0)

When I print base, it's not working for the second case. Any ideas why?

5
  • Do you want to match till 123 in both the cases? What if you have different numbers: - AB123452A? Commented Oct 17, 2012 at 15:45
  • Do you want to match the numbers between text ? Commented Oct 17, 2012 at 15:47
  • I want to extract AB1234 so basically everything before the letters at the end. I'm pretty sure the code I have there worked before.... Commented Oct 17, 2012 at 15:49
  • @user1328021 why dont you put your input string to be searched so we can help better understand. also, if any of these answers have helped answer your question, you can mark them as accepted, or, if you have solved your own question, you can post it here as an answer so others can learn. Commented Oct 18, 2012 at 15:45
  • my input string to be searched is what I wrote 1AB1234XYZ and I want to extract 1AB1234 ... everything before the suffix of letters at the end. I'm working on trying solutions listed below and will mark the one that works as the answer. Thanks! Commented Oct 18, 2012 at 15:47

3 Answers 3

1

Your regex doesn't work for the second case because it starts with a number; the \D at the beginning of your pattern matches anything that ISN'T a number.

You should be able to use something quite simple for this--simpler, in fact, than anything else I see here.

'.*\d'

That's it! This should match everything up to and including the last number in your string, and ignore everything after that.

Here's the pattern working online, so you can see for yourself.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you!!!! I knew there had to be an easier way. And thanks for introducing me to RegexPlanet. That site is brilliant.
1

(.+?\d+)\w+ would give you what you want.

Or even something like this

^(.+?)[a-zA-Z]+$

1 Comment

I would make the initial .+ greedy if I were you, since this will not work for 12AB1234XYZ (2 or more numbers at the beginning). However, it should work for his samples.
0

re.match starts at the beginning of the string, and re.search simply looks for it in the string. both return the first match. .group(0) is everything included in the match, if you had capturing groups, then .group(1) is the first group...etc etc... as opposed to normal convention where 0 is the first index, in this case, 0 is a special use case meaning everything.

in your case, depending on what you really need to capture, maybe using re.search is better. and instead of using 2 groups, you can use (\D+\d+) keep in mind, it will capture the first (non-digits,digits) group. it might be sufficient for you, but you might want to be more specific.

after reading your comment "everything before the letters at the end"

this regex is what you need:

regex = re.compile(r'(.+)[A-Za-z]')

1 Comment

re.match vs re.search shouldn't matter, since he's using the ^ anchor. That forces the match to start at the beginning of the string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.