0

I am trying to parse a string and fill an array with specific information contained in the string, but I am getting some unexpected behavior.

I have written a script that successfully does this for some use cases, but it doesn't work for all possible cases.

Consider the string: 'BEST POSITION:P(0) = 1.124 P(1) = 2.345 P(2) = 3.145 P(3) = 4.354'

The following code should create the list: [1.124, 2.345, 3.145, 4.354]

inputs_best = np.zeros(4)
string_in = 'BEST POSITION:P(0) = 1.124 P(1) = 2.345 P(2) = 3.145 P(3) = 4.354'

best_sols_clean = ''
for item in string_in:
    best_sols_clean += item

best_sols_clean = re.sub('[ \t]', '', best_sols_clean)

count = 0
while best_sols_clean.find('P(') is not -1:
    line_index = best_sols_clean.find('P(')
    try:
        inputs_best[count] = float(best_sols_clean[line_index+5:line_index+10])
        best_sols_clean = best_sols_clean[line_index+10:-1]
        count += 1
    except ValueError:
        inputs_best[count] = float(best_sols_clean[line_index+5:line_index+6])
        best_sols_clean = best_sols_clean[line_index+6:-1]
        count += 1

print(inputs_best)

The output of this script is:

[1.124 2.345 3.145 4. ]

For this string, this works, except for the last entry in the list that is cut off at too few digits.

The Except clause is used to catch exceptions when one or more of the values are integers, such as:

string_in = 'BEST POSITION:P(0) = 1 P(1) = 2.345 P(2) = 3.145 P(3) = 4'

which results in an error.

I believe the problem lies with the line best_sols_clean = best_sols_clean[line_index+10:-1] that for some reason throws away trailing digits of the string, even though I am slicing to the last element of the string.

For the string string_in = 'BEST POSITION:P(0) = 1 P(1) = 2.345 P(2) = 3.145 P(3) = 4' the program quits with the error

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    inputs_best[count] = float(best_sols_clean[line_index+5:line_index+10])
ValueError: could not convert string to float: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 21, in <module>
    inputs_best[count] = float(best_sols_clean[line_index+5:line_index+6])
ValueError: could not convert string to float:

I would also be open for a more elegent solution than what I am attempting.

4
  • 1
    split to tokens (list of strings) and runnng a state machine over them? you look for desired sequence of tokens and pop the ones that satisfy a condition Commented Jul 1, 2019 at 21:17
  • 1
    You could use a regular expression search with the expression P\(\d\)\s+=\s+([0-9\.]+) which gives 4 matches for your string, where the group 1 is the float. Commented Jul 1, 2019 at 21:20
  • 1
    By the way you have a missing quote at the end of your string_in but I can't edit it since that is just 1 character. You may want to fix that. Commented Jul 1, 2019 at 21:23
  • 1
    Thank you @FatihAkici, I was wondering why it isn't formatting properly and couldn't find what's causing it (-: Commented Jul 1, 2019 at 21:43

3 Answers 3

2

You are trying to hard-code the tiny bits, which makes things extremely inefficient, vulnerable and hard to debug. You probably have an issue with your indices but it may not be worthwhile to even dig deep. Why don't you just split your string on space, and try to capture all number-looking strings into a list? Like as follows:

string_in = 'BEST POSITION:P(0) = 1.124 P(1) = 2.345 P(2) = 3.145 P(3) = 4.354'
numbers = []
for x in string_in.split(' '):
    # Append float-able strings into your list
    try: numbers.append(float(x))
    # Pass only on the ValueErrors, do not use bare except. Any other error should break the code by design
    except ValueError: pass
# Produces: [1.124, 2.345, 3.145, 4.354]

If you input string_in = 'BEST POSITION:P(0) = 1 P(1) = 2.345 P(2) = 3.145 P(3) = 4' this returns [1.0, 2.345, 3.145, 4.0]. Is that good for your purposes?

Sign up to request clarification or add additional context in comments.

Comments

1

It looks like your problem is with this line

 best_sols_clean = best_sols_clean[line_index+10:-1]

Each time you run through the loop you take one character off of the end of the string. Try changing it to this:

 best_sols_clean = best_sols_clean[line_index+10:]

1 Comment

Thank you. I had also narrowed it down to this, and this is definitely a correct solution. I still don't understand why my solution threw away the last digit. Shouldn't the -1 index also grab the last element?
1

This will output all numbers in the string that aren't within parentheses:

import re
re.findall('[^(]([\d.]+)', string_in)

Example:

import re

string_in = 'BEST POSITION:P(0) = 1.124 P(1) = 2.345 P(2) = 3.145 P(3) = 4.354'
print(re.findall('[^(]([\d.]+)', string_in))
# ['1.124', '2.345', '3.145', '4.354']

string_in = 'BEST POSITION:P(0) = 1 P(1) = 2.345 P(2) = 3.145 P(3) = 4'
print(re.findall('[^(]([\d.]+)', string_in))
# ['1', '2.345', '3.145', '4']

3 Comments

I had a suspicion someone is going to post a one-liner regular expression solution (-: Thank you very much.
Sure. But note that it may give a problem if there will ever be periods that aren't decimals in your strings. Could that happen?
Noted, thanks, but no that shouldn't happen. The only thing that could change is decimals vs integers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.