0

I know this has been asked before¹ but responses don't seem to cover all corner cases.

I tried implementing the suggestion¹ with the test case

String("Doubles -1.0, 0, 1, 1.12345 and 2.50")

Which should return

[-1, 0, 1, 1.12345, 2.50]:

import java.util.Scanner;
import java.util.ArrayList;
import java.util.Locale;
public class Main
{
    public static void main(String[] args) {
        String string = new String("Doubles -1.0, 0, 1, 1.12345 and 2.50");
        System.out.println(string);
        ArrayList<Double> doubles = getDoublesFromString(string);
        System.out.println(doubles);
    }
    
    public static ArrayList<Double> getDoublesFromString(String string){
        Scanner parser = new Scanner(string);
        parser.useLocale(Locale.US);
        ArrayList<Double> doubles = new ArrayList<Double>();
        double currentDouble;
        while (parser.hasNext()){
            if(parser.hasNextDouble()){
                currentDouble = parser.nextDouble();
                doubles.add(currentDouble);
            }
            else {
                parser.next();
            }
        }
        parser.close();
        return doubles;
    }
}

Instead code above returns [1.12345, 2.5].

Did I implement it wrong? What's the fix for catching negative and 0's?

2
  • 1
    The problem are , in your string. By default the scanner will split the string on whitespace. Therefore, the first three doubles are read as -1.0,, 0, and 1,. The comma prevent those from being seen as double by the scanner. Commented May 31, 2022 at 7:20
  • @Turamarth I didn't know that. Thanks a lot! I used comma on the test case on purpose as in some languages (such as portuguese) comma is the separator for doubles, and the purposed solution used Locale.US so I was trying to test it as well. It will be hard to build something "universal" using scanner then, I'll keep up with the regex solution provided by Tim Commented May 31, 2022 at 7:28

3 Answers 3

5

I would use a regex find all approach here:

String string = new String("Doubles -1.0, 0, 1, 1.12345 and 2.50");
List<String> nums = new ArrayList<>();

String pattern = "-?\\d+(?:\\.\\d+)?";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(string);

while (m.find()) {
    nums.add(m.group());
}

System.out.println(nums);  // [-1.0, 0, 1, 1.12345, 2.50]

By the way, your question makes use of the String constructor, which is seldom used, but is interesting to see, especially for those of us who never use it.

Here is an explanation of the regex pattern:

-?            match an optional leading negative sign
\\d+          match a whole number
(?:\\.\\d+)?  match an optional decimal component
Sign up to request clarification or add additional context in comments.

5 Comments

That works! Thank you a lot. I try avoiding regex as they're hard to understand/test but as nothing else works I'll hapily try to understand what's going on and use it
I have added a description of what the regex pattern is doing. This pattern is not so complicated to understand (I hope). I also generally agree with you that complexity should be avoided, but regex just happens to work really well in this case.
Thank you very very much! I'll accept the answer as soon as stackoverflow allows me
This is neat and short but probably doesn't support lots of edge cases. Run a debugger inside the Scanner class and you'll see how complex their float pattern is, that should tell you something about the actual complexity of matching doubles (I would not have expected it!). I think it's for supporting things like NaN, Infinity, the scientific notation and so on. That's why all in all, I think the best advice is not to reinvent the (complex) wheel and use the Scanner class, with delimiters.
Hey @TimBiegeleisen I cannot suggest editions but here's your solution returning a List<Double> function onlinegdb.com/tLKr3XfkY
4

For your specific example, adding this at the construction of the scanner is sufficient: parser.useDelimiter("\\s|,");

The problem in your code is that the tokens containing a comma are not recognized as valid doubles. What the code above does is configuring the scanner to consider not only blank characters but also commas as token delimiters, and therefore the comma will not be in the token anymore, hence it will be a valid double that will successfully be parsed.

I believe this is the most appropriate solution because matching all doubles is actually complex. Below, I have pasted the regex that Scanner uses to do that, see how complicated this really is. Compared to splitting the string and then using Double.parseDouble, this is pretty similar but involves less custom code, and more importantly no exception throwing, which is slow.

(([-+]?((((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))|(((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))\x{2e}(([0-9\p{javaDigit}]))+|\x{2e}(([0-9\p{javaDigit}]))++)([eE][+-]?(([0-9\p{javaDigit}]))+)?)|(((((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))|(((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))\x{2e}(([0-9\p{javaDigit}]))+|\x{2e}(([0-9\p{javaDigit}]))++)([eE][+-]?(([0-9\p{javaDigit}]))+)?)|(\Q-\E((((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))|(((([0-9\p{javaDigit}]))++)|(\p{javaDigit}&&[^0]?(([0-9\p{javaDigit}]))?(\x{2c}(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}]))(([0-9\p{javaDigit}])))+))\x{2e}(([0-9\p{javaDigit}]))+|\x{2e}(([0-9\p{javaDigit}]))++)([eE][+-]?(([0-9\p{javaDigit}]))+)?))|[-+]?0[xX][0-9a-fA-F].[0-9a-fA-F]+([pP][-+]?[0-9]+)?|(([-+]?(NaN|\QNaN\E|Infinity|\Q∞\E))|((NaN|\QNaN\E|Infinity|\Q∞\E))|(\Q-\E(NaN|\QNaN\E|Infinity|\Q∞\E)))

Comments

2

First of all: I would use the regex solution, too… It's better and the following is just an alternative using split and replace/replaceAll while catching Exceptions:

public static void main(String[] args) {
    // input
    String s = "Doubles -1.0, 0, 1, 1.12345 and 2.50";
    // split by whitespace(s) (keep in mind the commas will stay)
    String[] parts = s.split("\\s+");
    // create a collection to store the Doubles
    List<Double> nums = new ArrayList<>();
    // stream the result of the split operation and
    Arrays.stream(parts).forEach(p -> {
        // try to…
        try {
            // replace all commas and parse the value
            nums.add(Double.parseDouble(p.replaceAll(",", "")));
        } catch (Exception e) {
            // which won't work for words like "Doubles", so print an error on those
            System.err.println("Could not parse \"" + p + "\"");
        }
    });
    // finally print all successfully parsed Double values
    nums.forEach(System.out::println);
}

Output:

Could not parse "Doubles"
Could not parse "and"
-1.0
0.0
1.0
1.12345
2.5

8 Comments

This might be faster than regex in some cases +1.
@TimBiegeleisen Yes, you could even skip the try-catch then… But this example definitely contains words.
Well, this still uses lots of regex so it's not like it's a regex-free solution ^^ All solutions here use some regex. I think configuring delimiters in the scanner is cleaner in this case, to be honest, compared to writing custom code.
Sure, it uses the split method which takes a regex… But it does not explicitly use a complex regex with pattern and matcher. In general, you are right @Dici
Yeah I think your solution works better than the currently accepted because it uses Java's built-in double parsing, so it will cover more cases (like scientific notation)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.