0

Below is my string format.

test_string=`"test (11 MHz - 11 MHz)"`
 test1_string = 'test1 (11 MHz - 11 MHz)'

Needed output like below using regex in python:

output = ["test1", "11 MHz", "11 MHz"] 
8
  • Give a minimal reproducible example illustrating the specific problem with your attempt Commented Dec 6, 2019 at 12:02
  • 1
    @rts He probably meant, please show your current regex pattern / attempt for easier being able to help by seeing where it failed. Commented Dec 6, 2019 at 12:09
  • Why would you expect otherwise? You're just splitting on whitespace, you might as well write "A1-A4 US (430 Mhz - 780 Mhz)".split(). Commented Dec 6, 2019 at 12:14
  • 1
    @bobblebubble Working as i expected. thanks. If u post the answer then i will upvote. Commented Dec 6, 2019 at 13:01
  • 1
    Using the PyPi regex module you might also use (?:^(\w+(?:-\w+)+(?: [A-Z]+)?) \(|\G(?!^))(\d+ MHz)(?: - (?!\)))?(?=[^()]*\)) regex101.com/r/rkYclW/1 Commented Dec 6, 2019 at 13:43

4 Answers 4

2

An idea with either non parenthesis at start or digits followed by mhz anywhere.

res = re.findall(r'(?i)^[^)(]+\b|\d+ mhz', test_string)

See this demo at regex101 or a Python demo at tio.run

  • with flag (?i) for ignorecase to match lower and upper Mhz
  • ^[^)(]+\b the first part will match one or more non parentheses from ^ start until a \b
  • | OR \d+ mhz one or more digits followed by the specified substring

This will work as long as your input matches the pattern.

Sign up to request clarification or add additional context in comments.

Comments

0

This regex seems to do the job ([^(\n]*) \((\d* Mhz) - (\d* Mhz)\)

You can try it online

The website gives some code you can use for matcing with Python

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([^(\n]*) \((\d* Mhz) - (\d* Mhz)\)"

test_str = ("A1-A4 US (430 Mhz - 780 Mhz)\n"
    "A7-A8 PS (420 Mhz - 180 Mhz)\n")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Comments

0

Using named groups:

import re
sample = "A1-A4 US (430 Mhz - 780 Mhz)"

split_pat = r"""
    (?P<first>.+)               # Capture everything up to first space
    \s\(                        # Skip space and initial parentheses
    (?P<second>\d+\s\bMhz\b)    # Capture numeric values, space, and Mhz
    \s+?\-\s+?                  # Skip hyphen in the middle
    (?P<third>\d+\s\bMhz\b)     # Capture numeric values, space, and Mhz
    \)                          # Check for closing  parentheses
    """

# Use re.X flag to handle verbose pattern string
p = re.compile(split_pat, re.X)

first_text = p.search(sample).group('first')
second_text = p.search(sample).group('second')
third_text = p.search(sample).group('third')

Comments

0

You can use re.findall to search the text:

import re

text = "A1-A4 US (430 Mhz - 780 Mhz)"

first_text, second_text, third_text = re.findall(r'(.*?US).*?(\d+.Mhz).*?(\d+.Mhz)', text)[0]
print(first_text)
print(second_text)
print(third_text)

Prints:

A1-A4 US
430 Mhz
780 Mhz

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.