0

I have a long string of the following form:

joined_string = "ASOGHFFFFFFFFFFFFFFFFFFFGFIOSGFFFFFFFFURHDHREEKFFFFFFIIIEI..."

it is a concatenation of random strings interspersed by strings of consecutive F letters:

ASOGH
FFFFFFFFFFFFFFFFFFF
GFIOSG
FFFFFFFF
URHDHREEK
FFFFFF
IIIEI

The number of consecutive F letters is not fixed, but there will be more than 5, and lets assume five F letters will not appear in random strings consecutively.

I want to extract only random strings to get the following list:

random_strings = ['ASOGH', 'GFIOSG', 'URHDHREEK', 'IIIEI']

I imagine there is a simple regex expression that would solve this task:

random_strings = joined_string.split('WHAT_TO_TYPE_HERE?')

Question: how to code a regex pattern for multiple identical characters?

2
  • Does this answer your question? Split string based on regex Commented Jul 8, 2021 at 8:17
  • 1
    str.split cannot take a regex so use the re module with the pattern F+ Commented Jul 8, 2021 at 8:18

3 Answers 3

1

I would use re.split for this task following way

import re
joined_string = "ASOGHFFFFFFFFFFFFFFFFFFFGFIOSGFFFFFFFFURHDHREEKFFFFFFIIIEI"
parts = re.split('F{5,}',joined_string)
print(parts)

output

['ASOGH', 'GFIOSG', 'URHDHREEK', 'IIIEI']

F{5,} denotes 5 or more F

Sign up to request clarification or add additional context in comments.

Comments

1

You can use split using F{5,} and keep this in capture group so that split text is also part of result:

import re
s = "ASOGHFFFFFFFFFFFFFFFFFFFGFIOSGFFFFFFFFURHDHREEKFFFFFFIIIEI"
print( re.split(r'(F{5,})', s) )

Output:

['ASOGH', 'FFFFFFFFFFFFFFFFFFF', 'GFIOSG', 'FFFFFFFF', 'URHDHREEK', 'FFFFFF', 'IIIEI']

Comments

0

I would use a regex find all approach here:

joined_string = "ASOGHFFFFFFFFFFFFFFFFFFFGFIOSGFFFFFFFFURHDHREEKFFFFFFIIIEI"
parts = re.findall(r'F{2,}|(?:[A-EG-Z]|F(?!F))+', joined_string)
print(parts)

This prints:

['ASOGH', 'FFFFFFFFFFFFFFFFFFF', 'GFIOSG', 'FFFFFFFF', 'URHDHREEK', 'FFFFFF', 'IIIEI']

The regex pattern here can be explained as:

F{2,}         match any group of 2 or more consecutive F's (first)
|             OR, that failing
(?:
    [A-EG-Z]  match any non F character
    |         OR
    F(?!F)    match a single F (not followed by an F)
)+            all of these, one or more times

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.