Using python regex with backreference matches

Question

I have a doubt about regex with backreference.

I need to match strings, I try this regex (\w)\1{1,} to capture repeated values of my string, but this regex only capture consecutive repeated strings; I'm stuck to improve my regex to capture all repeated values, below some examples:

import re

str = 'capitals'

re.search(r'(\w)\1{1,}', str)

Output None

import re

str = 'butterfly'

re.search(r'(\w)\1{1,}', str)

<_sre.SRE_Match object; span=(2, 4), match='tt'>

You can use .* before the backreference to allow anything in between the matches. — Barmar
– Barmar, Commented Dec 8, 2017 at 17:47
@Barmar I'm trying to match the repeated occurrences of letter a — Jess
– Jess, Commented Dec 8, 2017 at 18:20
@user3722709 You still haven't said what you expect the output to be. aa or apita? — Barmar
– Barmar, Commented Dec 8, 2017 at 20:10

Henry · Accepted Answer · 2017-12-08 18:57:11Z

6

I would use r'(\w).*\1 so that it allows any repeated character even if there are special characters or spaces in between.

However this wont work for strings with repeated characters overlapping the contents of groups like the string abcdabcd, in which it only recognizes the first group, ignoring the other repeated characters enclosed in the first group (b,c,d)

Check the demo: https://regex101.com/r/m5UfAe/1

So an alternative (and depending on your needs) is to sort the string analyzed:

import re
str = 'abcdabcde'
re.findall(r'(\w).*\1', ''.join(sorted(str)))

returning the array with the repeated characters ['a','b','c','d']

answered Dec 8, 2017 at 18:57

Henry

1511 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jess Over a year ago

It's worked here! But you can explain why when I remove the sorted built-in function the output is not correct?!? output with sorted: re.findall(regex_pattern, ''.join(sorted("testing this".lower()))) ['i', 's', 't'] output without sorted: re.findall(regex_pattern, ''.join("testing this".lower())) ['t']

Barmar Over a year ago

@user3722709 If you don't sort it, you're just returning the same string.

Tobias Tengler · Accepted Answer · 2019-08-03 11:03:07Z

Hope the code below will help you understand the Backreference concept of Python RegEx

There are two sets of information available in the given string str

Employee Basic Info:
- starting with @employeename and ends with employeename
- eg: @daniel dxc chennai 45000 male daniel
Employee designation
- starting with %employeename then designation and ends with employeename%
- eg: %daniel python developer daniel%

import re

#sample input

str="""
@daniel dxc chennai 45000 male daniel @henry infosys bengaluru 29000 male hobby- 
swimming henry
@raja zoho chennai 37000 male raja @ramu infosys bengaluru 99000 male hobby-badminton 
ramu
%daniel python developer daniel% %henry database admin henry%
%raja Testing lead raja% %ramu Manager ramu%
"""

#backreferencing employee name (\w+)  <----  \1
#----------------------------------------------
basic_info=re.findall(r'@+(\w+)(.*?)\1',str)
print(basic_info)

#(%) <-- \1  and (\w+) <--- \2 
#-------------------------------
designation=re.findall(r'(%)+(\w+)(.*?)\2\1',str)
print(designation)

for i in range(len(designation)):
    designation[i]=(designation[i][1],designation[i][2])
print(designation)

Collectives™ on Stack Overflow

Using python regex with backreference matches

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related