3

In Scala, I receive a UDP message, and end up with a DatagramPacket whose buffer has Array[Byte] containing the message. This message, which is all ASCII characters, is entirely fixed length fields, some of them numbers, other single characters or strings. What is the fastest way to parse these fields out of the message data?

As an example, suppose my message has the following format:

2 bytes - message type, either "AB" or "PQ" or "XY"
1 byte - status, either a, b, c, f, j, r, p or 6
4 bytes - a 4-character name
1 byte - sign for value 1, either space or "-"
6 bytes - integer value 1, in ASCII, with leading spaces, eg. "  1234"
1 byte - sign for value 2
6 bytes - decimal value 2

so a message could look like

ABjTst1 5467- 23.87

Message type "AB", status "j", name "Tst1", value 1 is 5467 and value 2 is -23.87

What I have done so far is get an array message: Array[Byte], and then take slices from it, such as

val msgType= new String(message.slice(0, 2))
val status = message(2).toChar
val name = new String(message.slice(3, 7))
val val1Sign = message(7).toChar
val val1= (new String(message.slice(8, 14)).trim.toInt * (if (val1Sign == '-') -1 else 1))
val val2Sign = message(14).toChar
val val2= (new String(message.slice(15, 21)).trim.toFloat * (if (val2Sign == '-') -1 else 1))

Of course, reused functionality, like parsing a number, would normally go in a function.

This technique is straightforward, but is there a better way to be doing this if speed is important?

0

2 Answers 2

2

Writing your own byte-array-to-primitive conversions would improve speed somewhat (if you're really that in need of speed), since it would avoid making an extra String object. Also, rather than slicing the array (which requires you to make another array), you should use the String constructor

String(byte[] bytes, int offset, int length)

which avoids making the extra copy.

Sign up to request clarification or add additional context in comments.

Comments

0

I don't have the data to make performance tests, but maybe you have? Did you try pattern matching, with a precompiled pattern?

The numbers in the comment enumerate the opening parens, which correspondend to the groups:

//                              12     3   4    5           6     7     8                 9     10
val pattern = Pattern.compile ("((AB)|(PQ)|(XY))([abcfjrp6])(.{4})([- ])( {0,6}[0-9]{0,6})([- ])([ 0-9.]{1,6})")
// 
def msplit (message: String) = {
  val matcher = pattern.matcher (message) 
  if (matcher.find ())
    List (1, 5, 6, 7, 8, 9, 10).foreach (g => println (matcher.group(g)))
}

val s = "ABjTst1 5467- 23.87"
msplit (s)

Pattern/Matcher is of course Javaland - maybe you find a more scala-way-solution with "...".r

Result:

AB
j
Tst1

5467
-
 23.87

2 Comments

I haven't run performance tests yet, I just wondered if there was some knowledge out there as a starting point. Given the fixed-length fields, I would expect indexing directly into the bytes at specific locations to be faster than running a regex matcher to yield the same result.
Of course, plain indexing should be faster, since it doesn't validate anything ((AB)|(PQ)|(XY)) etc. You get two chars, which might be '-1' or 'AB' or 'BA' or 'ab'.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.