Below is my string format.
test_string=`"test (11 MHz - 11 MHz)"`
test1_string = 'test1 (11 MHz - 11 MHz)'
Needed output like below using regex in python:
output = ["test1", "11 MHz", "11 MHz"]
An idea with either non parenthesis at start or digits followed by mhz anywhere.
res = re.findall(r'(?i)^[^)(]+\b|\d+ mhz', test_string)
See this demo at regex101 or a Python demo at tio.run
(?i) for ignorecase to match lower and upper Mhz^[^)(]+\b the first part will match one or more non parentheses from ^ start until a \b| OR \d+ mhz one or more digits followed by the specified substringThis will work as long as your input matches the pattern.
This regex seems to do the job ([^(\n]*) \((\d* Mhz) - (\d* Mhz)\)
The website gives some code you can use for matcing with Python
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"([^(\n]*) \((\d* Mhz) - (\d* Mhz)\)"
test_str = ("A1-A4 US (430 Mhz - 780 Mhz)\n"
"A7-A8 PS (420 Mhz - 180 Mhz)\n")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Using named groups:
import re
sample = "A1-A4 US (430 Mhz - 780 Mhz)"
split_pat = r"""
(?P<first>.+) # Capture everything up to first space
\s\( # Skip space and initial parentheses
(?P<second>\d+\s\bMhz\b) # Capture numeric values, space, and Mhz
\s+?\-\s+? # Skip hyphen in the middle
(?P<third>\d+\s\bMhz\b) # Capture numeric values, space, and Mhz
\) # Check for closing parentheses
"""
# Use re.X flag to handle verbose pattern string
p = re.compile(split_pat, re.X)
first_text = p.search(sample).group('first')
second_text = p.search(sample).group('second')
third_text = p.search(sample).group('third')
"A1-A4 US (430 Mhz - 780 Mhz)".split().(?:^(\w+(?:-\w+)+(?: [A-Z]+)?) \(|\G(?!^))(\d+ MHz)(?: - (?!\)))?(?=[^()]*\))regex101.com/r/rkYclW/1