I'm parsing the following AWS cost instance table:
m1.small 1 1 1.7 1 x 160 $0.044 per Hour
m1.medium 1 2 3.75 1 x 410 $0.087 per Hour
m1.large 2 4 7.5 2 x 420 $0.175 per Hour
m1.xlarge 4 8 15 4 x 420 $0.35 per Hour
There's a file with those costs:
input = new Scanner(file);
String[] values;
while (input.hasNextLine()) {
String line = input.nextLine();
values = line.split("\\s+"); // <-- not what I want...
for (String v : values)
System.out.println(v);
}
However that gives me:
m1.small
1
1
1.7
1
x
160
$0.044
per
Hour
which is not what I want ... A corrected parsed values (with the right regex) would look like this:
['m1.small', '1', '1', '1.7', '1 x 160', '$0.044', 'per Hour']
What would be the right regex in order to obtain the right result? One can assume the table will have always the same pattern.
\\t+. Those files are huge, so it may be pointless search those files in order to fix a missing tab.\\s{2,}?1.7 1 x 160 $0.044 per Hour\\sand then concatenate parts of the result.