Fastest way to parse fixed length fields out of a byte array in scala?

Question

In Scala, I receive a UDP message, and end up with a DatagramPacket whose buffer has Array[Byte] containing the message. This message, which is all ASCII characters, is entirely fixed length fields, some of them numbers, other single characters or strings. What is the fastest way to parse these fields out of the message data?

As an example, suppose my message has the following format:

2 bytes - message type, either "AB" or "PQ" or "XY"
1 byte - status, either a, b, c, f, j, r, p or 6
4 bytes - a 4-character name
1 byte - sign for value 1, either space or "-"
6 bytes - integer value 1, in ASCII, with leading spaces, eg. "  1234"
1 byte - sign for value 2
6 bytes - decimal value 2

so a message could look like

ABjTst1 5467- 23.87

Message type "AB", status "j", name "Tst1", value 1 is 5467 and value 2 is -23.87

What I have done so far is get an array message: Array[Byte], and then take slices from it, such as

val msgType= new String(message.slice(0, 2))
val status = message(2).toChar
val name = new String(message.slice(3, 7))
val val1Sign = message(7).toChar
val val1= (new String(message.slice(8, 14)).trim.toInt * (if (val1Sign == '-') -1 else 1))
val val2Sign = message(14).toChar
val val2= (new String(message.slice(15, 21)).trim.toFloat * (if (val2Sign == '-') -1 else 1))

Of course, reused functionality, like parsing a number, would normally go in a function.

This technique is straightforward, but is there a better way to be doing this if speed is important?

Rex Kerr · Accepted Answer · 2011-04-22 22:04:39Z

2

Writing your own byte-array-to-primitive conversions would improve speed somewhat (if you're really that in need of speed), since it would avoid making an extra String object. Also, rather than slicing the array (which requires you to make another array), you should use the String constructor

String(byte[] bytes, int offset, int length)

which avoids making the extra copy.

answered Apr 22, 2011 at 22:04

Rex Kerr

168k27 gold badges325 silver badges411 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user unknown · Accepted Answer · 2011-04-23 03:32:04Z

0

I don't have the data to make performance tests, but maybe you have? Did you try pattern matching, with a precompiled pattern?

The numbers in the comment enumerate the opening parens, which correspondend to the groups:

//                              12     3   4    5           6     7     8                 9     10
val pattern = Pattern.compile ("((AB)|(PQ)|(XY))([abcfjrp6])(.{4})([- ])( {0,6}[0-9]{0,6})([- ])([ 0-9.]{1,6})")
// 
def msplit (message: String) = {
  val matcher = pattern.matcher (message) 
  if (matcher.find ())
    List (1, 5, 6, 7, 8, 9, 10).foreach (g => println (matcher.group(g)))
}

val s = "ABjTst1 5467- 23.87"
msplit (s)

Pattern/Matcher is of course Javaland - maybe you find a more scala-way-solution with "...".r

Result:

AB
j
Tst1

5467
-
 23.87

answered Apr 23, 2011 at 3:32

user unknown

36.4k12 gold badges77 silver badges123 bronze badges

2 Comments

Willard Over a year ago

I haven't run performance tests yet, I just wondered if there was some knowledge out there as a starting point. Given the fixed-length fields, I would expect indexing directly into the bytes at specific locations to be faster than running a regex matcher to yield the same result.

user unknown Over a year ago

Of course, plain indexing should be faster, since it doesn't validate anything ((AB)|(PQ)|(XY)) etc. You get two chars, which might be '-1' or 'AB' or 'BA' or 'ab'.

Collectives™ on Stack Overflow

Fastest way to parse fixed length fields out of a byte array in scala?

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related