Splitting string into groups with regex?

Question

I have strings that can have a various amount of "groups". I need to split them, but I am having trouble doing so. The groups will always start with [A-Z]{2-5} followed by a : and a string or varying length and spaces. It will always have a space in front of the group.

Example strings:

"YellowSky AA:Hello AB:1234 AC:1F 322 AD:hj21jkhjk23"
"Billy Bob Thorton AA:213231 AB:aaaa AC:ddddd 322 AD:hj2ffs   dsfdsfd1jkhjk23"

My code thus far:

import re
D = "Test1 AA:Hello AB:1234 AC:1F 322 AD:hj21jkhjk23"
    
g = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(D)

As you can see... this works for one word starting string, but not multiple words.

But this fails /w spaces:

What is the expected output? Try (?!^)\s+(?=[A-Z]+:), see regex101.com/r/QTmjkX/1 — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jun 7, 2021 at 22:13
Don't use split. Write a regexp that matches the groups, and use re.findall() — Barmar
– Barmar, Commented Jun 7, 2021 at 22:15

Wiktor Stribiżew · Accepted Answer · 2021-06-07 22:20:12Z

2

You can use

re.split(r'(?!^)\s+(?=[A-Z]+:)', text)

See this regex demo.

Details:

(?!^) - a negative lookahead that matches a location not at the start of string (equal to (?<!^) but one char shorter)
\s+ - one or more whitespaces
(?=[A-Z]+:) - a positive lookahead that requires one or more uppercase ASCII letters followed with a : char immediately to the right of the current location.

answered Jun 7, 2021 at 22:20

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

halfer · Accepted Answer · 2023-09-15 09:24:05Z

1

([A-Z]{2,5}:\w+(?: +\w+)*)(?=(?: +[A-Z]+:|$))

You can also use re.findall directly.

See demo.

https://regex101.com/r/6jf8EM/1

This way you don't need to filter unwanted groups later. You get what you need.

edited Sep 15, 2023 at 9:24

halfer

20.2k20 gold badges110 silver badges207 bronze badges

answered Jun 7, 2021 at 22:32

vks

68.1k11 gold badges96 silver badges132 bronze badges

1 Comment

Dulanic Over a year ago

Thanks, it does help, though I need to first section too.

Collectives™ on Stack Overflow

Splitting string into groups with regex?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related