[Delta-1234, United-1345] Testing different airlines
[Delta-1234] Testing different airlines
I want to get Delta-1234 and United-1345 in the first case and just Delta-1234 in the second. Is it possible using findall?
Do you really need regular expressions? You can just find elements between the brackets [ and ]
x = lambda s: s[s.index('['):s.index("]")+1]
string1 = "[Delta-1234, United-1345] Testing different airlines"
string2 = "[Delta-1234] Testing different airlines"
print(x(string1))
print(x(string2))
outputs
[Delta-1234, United-1345]
[Delta-1234]
If you want to use a regular expression, just match [, and then (greedily) capture repeated non-]s:
>>> regex = re.compile(r"\[([^\]]+)")
>>> re.findall(regex, "[Delta-1234, United-1345] Testing different airlines")
['Delta-1234, United-1345']
>>> re.findall(regex, "[Delta-1234] Testing different airlines")
['Delta-1234']
Or use lookbehind
>>> regex = re.compile(r"(?<=\[)[^\]]+")
>>> re.findall(regex, "[Delta-1234, United-1345] Testing different airlines")
['Delta-1234, United-1345']
>>> re.findall(regex, "[Delta-1234] Testing different airlines")
['Delta-1234']
findall, the returned value will have to be a list of tuples, there's no way around that. You can use r"\[(\S+)(?:, (\S+))?\]" to capture the first, or the first and second airline code.[Delta-1234, United-1345, Spirit-8778] Testing different airlines. My point being, the airlines and their code can vary and can be more than 1.findall for this, then you'll need a separate capturing group for each substring. (Repeating a captured group, eg \[(\S+)(?:, (\S+))*`, doesn't work because only the last match for the second group would be retained in the result.) While you *could* manually repeat groups like r"[(\S+)(?:, (\S+))?(?:, (\S+))?]"` (repeat the groups as much as you need), that's pretty messy. Better to keep code DRY and branch out from pure re.findall.Another way to achieve this using regex is:
import re
str1 = "[Delta-1234, United-1345] Testing different airlines"
str2 = "[Delta-1234] Testing different airlines"
regex_pattern = r"[^[]*\[([^]]*)\]"
print(re.match(regex_pattern, str1).groups()[0])
print(re.match(regex_pattern, str2).groups()[0])
It will print
Delta-1234, United-1345
Delta-1234
Given:
s='''\
[Delta-1234, United-1345] Testing different airlines
[Delta-1234] Testing different airlines'''
You can do:
>>> [e.split(', ') for e in re.findall(r'\[([^]]+)\]', s)]
[['Delta-1234', 'United-1345'], ['Delta-1234']]
[Delta-1234, United-1345] Testing different airlines, then re.findall(r'\[([^]]+)\]', s only creates one value in the list: ['Delta-1234, United-1345']. However, I am looking for two values in the list. Is that possible?e.split(', ') to e.split(',') (ie, no space after the comma). Or split with a regex.
[('Delta', '1234'), ('United', ''1345)]that's why I thought findall may be a good option!