Java function to parse all doubles from string

Question

I know this has been asked before¹ but responses don't seem to cover all corner cases.

I tried implementing the suggestion¹ with the test case

String("Doubles -1.0, 0, 1, 1.12345 and 2.50")

Which should return

[-1, 0, 1, 1.12345, 2.50]:

import java.util.Scanner;
import java.util.ArrayList;
import java.util.Locale;
public class Main
{
    public static void main(String[] args) {
        String string = new String("Doubles -1.0, 0, 1, 1.12345 and 2.50");
        System.out.println(string);
        ArrayList<Double> doubles = getDoublesFromString(string);
        System.out.println(doubles);
    }
    
    public static ArrayList<Double> getDoublesFromString(String string){
        Scanner parser = new Scanner(string);
        parser.useLocale(Locale.US);
        ArrayList<Double> doubles = new ArrayList<Double>();
        double currentDouble;
        while (parser.hasNext()){
            if(parser.hasNextDouble()){
                currentDouble = parser.nextDouble();
                doubles.add(currentDouble);
            }
            else {
                parser.next();
            }
        }
        parser.close();
        return doubles;
    }
}

Instead code above returns [1.12345, 2.5].

Did I implement it wrong? What's the fix for catching negative and 0's?

The problem are , in your string. By default the scanner will split the string on whitespace. Therefore, the first three doubles are read as -1.0,, 0, and 1,. The comma prevent those from being seen as double by the scanner. — Turamarth
– Turamarth, Commented May 31, 2022 at 7:20
@Turamarth I didn't know that. Thanks a lot! I used comma on the test case on purpose as in some languages (such as portuguese) comma is the separator for doubles, and the purposed solution used Locale.US so I was trying to test it as well. It will be hard to build something "universal" using scanner then, I'll keep up with the regex solution provided by Tim — nluizsoliveira
– nluizsoliveira, Commented May 31, 2022 at 7:28

Tim Biegeleisen · Accepted Answer · 2022-05-31 07:27:20Z

5

I would use a regex find all approach here:

String string = new String("Doubles -1.0, 0, 1, 1.12345 and 2.50");
List<String> nums = new ArrayList<>();

String pattern = "-?\\d+(?:\\.\\d+)?";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(string);

while (m.find()) {
    nums.add(m.group());
}

System.out.println(nums);  // [-1.0, 0, 1, 1.12345, 2.50]

By the way, your question makes use of the String constructor, which is seldom used, but is interesting to see, especially for those of us who never use it.

Here is an explanation of the regex pattern:

-?            match an optional leading negative sign
\\d+          match a whole number
(?:\\.\\d+)?  match an optional decimal component

edited May 31, 2022 at 7:27

answered May 31, 2022 at 7:18

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

nluizsoliveira Over a year ago

That works! Thank you a lot. I try avoiding regex as they're hard to understand/test but as nothing else works I'll hapily try to understand what's going on and use it

Tim Biegeleisen Over a year ago

I have added a description of what the regex pattern is doing. This pattern is not so complicated to understand (I hope). I also generally agree with you that complexity should be avoided, but regex just happens to work really well in this case.

nluizsoliveira Over a year ago

Thank you very very much! I'll accept the answer as soon as stackoverflow allows me

Dici Over a year ago

This is neat and short but probably doesn't support lots of edge cases. Run a debugger inside the Scanner class and you'll see how complex their float pattern is, that should tell you something about the actual complexity of matching doubles (I would not have expected it!). I think it's for supporting things like NaN, Infinity, the scientific notation and so on. That's why all in all, I think the best advice is not to reinvent the (complex) wheel and use the Scanner class, with delimiters.

nluizsoliveira Over a year ago

Hey @TimBiegeleisen I cannot suggest editions but here's your solution returning a List<Double> function onlinegdb.com/tLKr3XfkY

Dici · Accepted Answer · 2022-05-31 07:52:11Z

For your specific example, adding this at the construction of the scanner is sufficient: parser.useDelimiter("\\s|,");

The problem in your code is that the tokens containing a comma are not recognized as valid doubles. What the code above does is configuring the scanner to consider not only blank characters but also commas as token delimiters, and therefore the comma will not be in the token anymore, hence it will be a valid double that will successfully be parsed.

I believe this is the most appropriate solution because matching all doubles is actually complex. Below, I have pasted the regex that Scanner uses to do that, see how complicated this really is. Compared to splitting the string and then using Double.parseDouble, this is pretty similar but involves less custom code, and more importantly no exception throwing, which is slow.

(([-+]?((((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))|(((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))\x{2e}(([0-9\p{javaDigit}]))+|\x{2e}(([0-9\p{javaDigit}]))++)([eE][+-]?(([0-9\p{javaDigit}]))+)?)|(((((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))|(((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))\x{2e}(([0-9\p{javaDigit}]))+|\x{2e}(([0-9\p{javaDigit}]))++)([eE][+-]?(([0-9\p{javaDigit}]))+)?)|(\Q-\E((((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))|(((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))\x{2e}(([0-9\p{javaDigit}]))+|\x{2e}(([0-9\p{javaDigit}]))++)([eE][+-]?(([0-9\p{javaDigit}]))+)?))|[-+]?0[xX][0-9a-fA-F].[0-9a-fA-F]+([pP][-+]?[0-9]+)?|(([-+]?(NaN|\QNaN\E|Infinity|\Q∞\E))|((NaN|\QNaN\E|Infinity|\Q∞\E))|(\Q-\E(NaN|\QNaN\E|Infinity|\Q∞\E)))

deHaar · Accepted Answer · 2022-05-31 07:28:56Z

2

First of all: I would use the regex solution, too… It's better and the following is just an alternative using split and replace/replaceAll while catching Exceptions:

public static void main(String[] args) {
    // input
    String s = "Doubles -1.0, 0, 1, 1.12345 and 2.50";
    // split by whitespace(s) (keep in mind the commas will stay)
    String[] parts = s.split("\\s+");
    // create a collection to store the Doubles
    List<Double> nums = new ArrayList<>();
    // stream the result of the split operation and
    Arrays.stream(parts).forEach(p -> {
        // try to…
        try {
            // replace all commas and parse the value
            nums.add(Double.parseDouble(p.replaceAll(",", "")));
        } catch (Exception e) {
            // which won't work for words like "Doubles", so print an error on those
            System.err.println("Could not parse \"" + p + "\"");
        }
    });
    // finally print all successfully parsed Double values
    nums.forEach(System.out::println);
}

Output:

Could not parse "Doubles"
Could not parse "and"
-1.0
0.0
1.0
1.12345
2.5

answered May 31, 2022 at 7:28

deHaar

18.7k11 gold badges48 silver badges57 bronze badges

8 Comments

Tim Biegeleisen Over a year ago

This might be faster than regex in some cases +1.

deHaar Over a year ago

@TimBiegeleisen Yes, you could even skip the try-catch then… But this example definitely contains words.

Dici Over a year ago

Well, this still uses lots of regex so it's not like it's a regex-free solution ^^ All solutions here use some regex. I think configuring delimiters in the scanner is cleaner in this case, to be honest, compared to writing custom code.

deHaar Over a year ago

Sure, it uses the split method which takes a regex… But it does not explicitly use a complex regex with pattern and matcher. In general, you are right @Dici

Dici Over a year ago

Yeah I think your solution works better than the currently accepted because it uses Java's built-in double parsing, so it will cover more cases (like scientific notation)

|

Collectives™ on Stack Overflow

Java function to parse all doubles from string

3 Answers 3

5 Comments

Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related