1

I am using python to try to separate the information from strings of this type:

r = "(-0.04530261550379927+0j) [X0 X1 Y2 Y3]"

ultimately the information that I need is the number in the parenthesis and separate the letters from the numbers in the array. So in the example above, the results that I would like to get are: the number -0.04530261550379927, an array: [X, X, Y, Y] and another array: [0, 1, 2, 3].

I have been trying with re.match but since this is the first time that I use this module I find it very confusing.

Would appreciate any help.

2
  • 3
    You could share your best attempt at solving this with well chosen sample input and the corresponding output, this could be a good starting point to help you understand what improvements your code needs. Also, some points are unclear: 1/ the sample data contains a complex number, your expected output is a float. What about the imaginary part? 2/ In the second part, would it always be exactly one letter followed by exactly one digit, and this exactly four times? Please clarify. Commented Apr 10, 2021 at 11:42
  • @ThierryLathuille For all the data that I have the imaginary part is always 0 so it is no problem. In the second part, sometimes the array might be empty, or otherwise it will always be a letter followed by exactly one number, not necessarily exactly four times, could be one, two or three as well. Commented Apr 10, 2021 at 12:47

1 Answer 1

1

You can do like this:

import re

r = "(-0.04530261550379927+0j) [X0 X1 Y2 Y3]"
match = re.match(r"\(([-+]?\d+(?:\.\d+)?)\+\d+j\) \[((?:[XYZ]\d(?: [XYZ]\d)*)?)]", r)
number, array = match.groups()

number = float(number)
a1, a2 = [], []
for i in array.split():
    a1.append(i[0])
    a2.append(int(i[1]))

print(number, a1, a2)

Explanation:

Regex pattern r"\(([-+]?\d+(?:\.\d+)?)\+\d+j\) \[((?:[XYZ]\d(?: [XYZ]\d)*)?)]" matches the given string:

  • part ([-+]?\d+(?:\.\d+)?) matches number
  • part ((?:[XYZ]\d(?: [XYZ]\d)*)?) matches array
  • there are non-capturing groups defined like (?:<match>)

match.groups() returns a list of all captured groups (2 in our case), and we unpack the list to variables number, array

Next, we split our string stored in array by space and iterate through items:

  • first character is appended to a1
  • second character is converted to int and appended to a2

Output:

-0.04530261550379927 ['X', 'X', 'Y', 'Y'] [0, 1, 2, 3]
Sign up to request clarification or add additional context in comments.

11 Comments

+1 Note that the digit [\d]+ does not have to be between square brackets. As a small suggestion, if you repeat the last part between the square bracket with a leading space in the group you don't need the question mark and then there can be no trailing space. \(([-+]?\d+(?:\.\d+)?)\+\d+j\) \[([XYZ]\d(?: [XYZ]\d)*)] See regex101.com/r/HbDlN7/1
@Thefourthbird, thanks for correcting! I will update the answer. But one problem with your suggested regex is that it doesn't match empty arrays, so I will skip that part
I am sorry I missed that part. In that case you can make the whole repeating part optional. regex101.com/r/M7GNuZ/1
Yep, I did so and updated the answer, thank you)
@GoldenLion [-+]? matches -+ or nothing, \d+ matches number sequence, (?:) is non-capturing group, so (?:\.\d+)? matches a floating point part of the number if exists without capturing the group, doing so we will not receive that unneeded group when we call match.groups(). Hope that helps, don't forget to upvote if the answer was helpful :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.