Splitting multiple strings using regular expression

Question

[Delta-1234, United-1345] Testing different airlines
[Delta-1234] Testing different airlines

I want to get Delta-1234 and United-1345 in the first case and just Delta-1234 in the second. Is it possible using findall?

I don't see how findall() could do it because you don't want the square brackets in the resulting list. So the square brackets can't be in the pattern. In @CertainPerformances answer you'd still have to split on commas and remove the square brackets. — jgreve
– jgreve, Commented Jul 31, 2018 at 0:11
oops, just split on the commas for @CertainPerformace - I missed that the square brackets are outside the capture group. That is assuming you want an actual list of flight-like things, e.g. a=[ 'Delta-1234', 'United-1345' ] instead of a list with a single csv-string like b=[ 'Delta-1234, United-1345' ]. Note len(a) == 2 while len(b) == 1. — jgreve
– jgreve, Commented Jul 31, 2018 at 0:31
@jgreve That's what I just i.e. len(b) == 2. But I wanted to see if its possible with just one regex rather than doing a split later. I actually want something like [('Delta', '1234'), ('United', ''1345)] that's why I thought findall may be a good option! — Jason
– Jason, Commented Jul 31, 2018 at 1:14

rafaelc · Accepted Answer · 2018-07-31 00:02:35Z

1

Do you really need regular expressions? You can just find elements between the brackets [ and ]

x = lambda s: s[s.index('['):s.index("]")+1]

string1 = "[Delta-1234, United-1345] Testing different airlines"
string2 = "[Delta-1234] Testing different airlines"

print(x(string1))
print(x(string2))

outputs

[Delta-1234, United-1345]
[Delta-1234]

answered Jul 31, 2018 at 0:02

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jason Over a year ago

I just wanted a list as an output, I am not sure if this lamba gives me a list, but rather a string,

CertainPerformance · Accepted Answer · 2018-07-31 00:06:55Z

0

If you want to use a regular expression, just match [, and then (greedily) capture repeated non-]s:

>>> regex = re.compile(r"\[([^\]]+)")
>>> re.findall(regex, "[Delta-1234, United-1345] Testing different airlines")
['Delta-1234, United-1345']
>>> re.findall(regex, "[Delta-1234] Testing different airlines")
['Delta-1234']

Or use lookbehind

>>> regex = re.compile(r"(?<=\[)[^\]]+")
>>> re.findall(regex, "[Delta-1234, United-1345] Testing different airlines")
['Delta-1234, United-1345']
>>> re.findall(regex, "[Delta-1234] Testing different airlines")
['Delta-1234']

answered Jul 31, 2018 at 0:06

CertainPerformance

373k55 gold badges354 silver badges359 bronze badges

4 Comments

Jason Over a year ago

So the first out is a list of one item: ['Delta-1234, United-1345']. Can this be split to a list of two items using the regex?

CertainPerformance Over a year ago

If you want more than one group, then if you want to use findall, the returned value will have to be a list of tuples, there's no way around that. You can use r"\[(\S+)(?:, (\S+))?\]" to capture the first, or the first and second airline code.

Jason Over a year ago

The issue is that the regex wouldn't work if the string becomes [Delta-1234, United-1345, Spirit-8778] Testing different airlines. My point being, the airlines and their code can vary and can be more than 1.

CertainPerformance Over a year ago

If you want to use findall for this, then you'll need a separate capturing group for each substring. (Repeating a captured group, eg

\[(\S+)(?:, (\S+))*`, doesn't work because only the last match for the second group would be retained in the result.) While you *could* manually repeat groups like

r"[(\S+)(?:, (\S+))?(?:, (\S+))?]"` (repeat the groups as much as you need), that's pretty messy. Better to keep code DRY and branch out from pure re.findall.

Waleed Iqbal · Accepted Answer · 2018-07-31 00:19:24Z

0

Another way to achieve this using regex is:

import re

str1 = "[Delta-1234, United-1345] Testing different airlines"
str2 = "[Delta-1234] Testing different airlines"

regex_pattern = r"[^[]*\[([^]]*)\]"

print(re.match(regex_pattern, str1).groups()[0])
print(re.match(regex_pattern, str2).groups()[0])

It will print

Delta-1234, United-1345
Delta-1234

answered Jul 31, 2018 at 0:19

Waleed Iqbal

1066 bronze badges

Comments

dawg · Accepted Answer · 2018-07-31 01:33:46Z

0

Given:

s='''\
[Delta-1234, United-1345] Testing different airlines
[Delta-1234] Testing different airlines'''

You can do:

>>> [e.split(', ') for e in re.findall(r'\[([^]]+)\]', s)]
[['Delta-1234', 'United-1345'], ['Delta-1234']]

answered Jul 31, 2018 at 1:33

dawg

105k24 gold badges142 silver badges217 bronze badges

3 Comments

Jason Over a year ago

If I were to use just [Delta-1234, United-1345] Testing different airlines, then re.findall(r'\[([^]]+)\]', s only creates one value in the list: ['Delta-1234, United-1345']. However, I am looking for two values in the list. Is that possible?

dawg Over a year ago

It is returning a list of lists. The string you state is being split correctly into a two element list inside another list.

dawg Over a year ago

You might try changing the e.split(', ') to e.split(',') (ie, no space after the comma). Or split with a regex.

Collectives™ on Stack Overflow

Splitting multiple strings using regular expression

4 Answers 4

1 Comment

4 Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

4 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related