7

I'm parsing the following AWS cost instance table:

m1.small    1   1   1.7     1 x 160    $0.044 per Hour
m1.medium   1   2   3.75    1 x 410    $0.087 per Hour
m1.large    2   4   7.5     2 x 420    $0.175 per Hour
m1.xlarge   4   8   15      4 x 420    $0.35 per Hour

There's a file with those costs:

input = new Scanner(file);
String[] values;
while (input.hasNextLine()) {
    String line = input.nextLine();
    values = line.split("\\s+"); // <-- not what I want...
    for (String v : values)
        System.out.println(v);
}

However that gives me:

m1.small
1
1
1.7
1
x
160
$0.044
per
Hour

which is not what I want ... A corrected parsed values (with the right regex) would look like this:

['m1.small', '1', '1', '1.7', '1 x 160', '$0.044', 'per Hour']

What would be the right regex in order to obtain the right result? One can assume the table will have always the same pattern.

8
  • 4
    Is the actual data separated with tabs? Can there be instances where columns are only delimited by one space? Commented Dec 25, 2015 at 3:49
  • @Pietu1998 Mostly ... but not necessarily ... the regex would produce something more robust ... that why I didn't move to parse with \\t+. Those files are huge, so it may be pointless search those files in order to fix a missing tab. Commented Dec 25, 2015 at 3:51
  • Can you use \\s{2,}? Commented Dec 25, 2015 at 3:53
  • @PM77-1 Yes but that gave me a problem: 1.7 1 x 160 $0.044 per Hour Commented Dec 25, 2015 at 3:53
  • How about going the other way? Split by \\s and then concatenate parts of the result. Commented Dec 25, 2015 at 3:57

3 Answers 3

5

Try this fiddle https://regex101.com/r/sP6zW5/1

([^\s]+)\s+(\d+)\s+(\d+)\s+([\d\.]+)\s+(\d+ x \d+)\s+(\$\d+\.\d+)\s+(per \w+)

match the text and the group is your list.

I think use split in your case is too complicated. If the text is always the same.Just like a reverse procedure of string formatting.

Sign up to request clarification or add additional context in comments.

1 Comment

amow, one more thing ... Java is bitching about invalid escape sequence ... how would I fix that?
5

If you want to use a regular expression, you'd do this:

        String s = "m1.small    1   1   1.7     1 x 160    $0.044 per Hour";
        String spaces = "\\s+";
        String type = "(.*?)";
        String intNumber = "(\\d+)";
        String doubleNumber = "([0-9.]+)";
        String dollarNumber = "([$0-9.]+)";
        String aXb = "(\\d+ x \\d+)";
        String rest = "(.*)";

        Pattern pattern = Pattern.compile(type + spaces + intNumber + spaces + intNumber + spaces + doubleNumber
                + spaces + aXb + spaces + dollarNumber + spaces + rest);
        Matcher matcher = pattern.matcher(s);
        while (matcher.find()) {
            String[] fields = new String[] { matcher.group(1), matcher.group(2), matcher.group(3), matcher.group(4),
                    matcher.group(5), matcher.group(6), matcher.group(7) };
            System.out.println(Arrays.toString(fields));
        }

Notice how I've broken up the regular expression to be readable. (As one long String, it is hard to read/maintain.) There's another way of doing it though. Since you know which fields are being split, you could just do this simple split and build a new array with the combined values:

        String[] allFields = s.split("\\s+");
        String[] result = new String[] { 
            allFields[0], 
            allFields[1],
            allFields[2],
            allFields[3],
            allFields[4] + " " + allFields[5] + " " + allFields[6],         
            allFields[7], 
            allFields[8] + " " + allFields[9] };
        System.out.println(Arrays.toString(result));

Comments

4

Split by one oe more spaces. And the spaces must appear in the context below.

DIGIT - SPACES - NOT "x"

or

NOT "x" - SPACES - DIGIT

    values = line.split("(?<=\\d)\\s+(?=[^x])|(?<=[^x])\\s+(?=\\d)")));

2 Comments

Boom! Thank you so much
@Just one more thing ... some lines have EBS Only $0.024, so it should match for that too ... in this case, ['EBS Only', '$0.024'] .. I tried to add that but didn't work ...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.