I'd like to parse an UTF8 encoded text file that may contain something like this:
int 1
text " some text with \" and \\ "
int list[-45,54, 435 ,-65]
float list [ 4.0, 5.2,-5.2342e+4]
The numbers in the list are separated by commas. Whitespace is permitted but not required between any number and any symbol like commas and brackets here. Similarly for words and symbols, like in the case of list[
I've done the quoted string reading by forcing Scanner to give me single chars (setting its delimiter to an empty pattern) because I still thought it'll be useful for reading the ints and floats, but I'm not sure anymore.
The Scanner always takes a complete token and then tries to match it. What I need is try to match as much (or as little) as possible, disregarding delimiters.
Basically for this input
int list[-45,54, 435 ,-65]
I'd like to be able to call and get this
s.nextWord() // int
s.nextWord() // list
s.nextSymbol() // [
s.nextInt() // -45
s.nextSymbol() // ,
s.nextInt() // 54
s.nextSymbol() // ,
s.nextInt() // 435
s.nextSymbol() // ,
s.nextInt() // -65
s.nextSymbol() // ]
and so on.
Or, if it couldn't parse doubles and other types itself, at least a method that takes a regex, returns the biggest string that matches it (or an error) and sets the stream position to just after what it matched.
Can the Scanner somehow be used for this? Or is there another approach? I feel this must be quite a common thing to do, but I don't seem to be able to find the right tool for it.
