6

I'm trying to split a text bunch with | bar separator. 123.123.123.123|000.000.000.000 to each ip address blocks. But each numbers are splited not by |.

scala> "123.123.123.123|000.000.000.000".split("|")
res30: Array[java.lang.String] = Array("", 1, 2, 3, ., 1, 2, 3, ., 1, 2, 3, ., 1, 2, 3, |, 0, 0, 0, ., 0, 0, 0, ., 0, 0, 0, ., 0, 0, 0)

scala> "123.123.123.123".split("|")
res33: Array[java.lang.String] = Array("", 1, 2, 3, ., 1, 2, 3, ., 1, 2, 3, ., 1, 2, 3)

So I put the separator as Char and it shows what I intended.

scala> "123.123.123.123|000.000.000.000".split('|')
res31: Array[String] = Array(123.123.123.123, 000.000.000.000)

scala> "123.123.123.123".split('|')
res32: Array[String] = Array(123.123.123.123)

Why does single character make a huge difference?

I've read Scala doc and StringLike.scala, and got no answer.

def split(separators: Array[Char]): Array[String]
def split(separator: Char): Array[String]

Thanks.

3
  • According to the documentation, split takes a Char as an argument. Commented May 27, 2013 at 7:20
  • 2
    @squiguy not only Char: it can back to java's split, which takes String [regex] Commented May 27, 2013 at 7:20
  • @om-nom-nom Sure, in that case | is special which is obviously what you said in your answer. Commented May 27, 2013 at 7:23

2 Answers 2

12

Split method accepts either string or character(s). If you use string it will be interpreted as a regexp and "|" is treated as regex 'or' -- in your case it backs to 'every character goes to it's own bin'. Escape it to have raw delimeter:

"123.123.123.123|000.000.000.000".split("\\|")
res1: Array[String] = Array(123.123.123.123, 000.000.000.000)

Character separator is interpreted as is, so you got the desired result without any fuss

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @om-nom-nom ! I forgot to remember many of Scala methods are actually from java.lang and java classes. I'll be with javadoc at next time.
2

Note that, as om-nom-nom correctly mentioned (but didn't provide the example), characters (which are enclosed in single ') are also valid:

"123.123.123.123|000.000.000.000".split('|')

I find this to be more obvious/readable. I'm also assuming that this would be faster, since it does not have to invoke the regex parser. But that is speculation of course, and also unnecessary micro-optimization.

5 Comments

It should be considerably faster than regexing - no doubt about that. Actually regexing, in that scenario would simply be a mistake. But the question is not about that, and what you suggest is actually already discovered in the question.
Yes, I said that om-nom-nom had already mentioned it. I just wanted to provide an actual example, since I know many people (including myself) will often only skim an answer for a piece of code, which often is the solution. Since I consider the code example om-nom-nom posted not optimal, I posted my own :)
@x3ro well, I didn't written this because op did it in his examples (res31) ;-) But I do agree that this is likely to be faster than string version
@om-nom-nom: Oh, now I feel a little stupid :D I could swear that it wasn't there when I first read the question, but I've probably just overlooked it... My bad
@x3ro: In other language, I also used to use single quote for all fixed string as the performance concerns you've mentioned above but Scala doesn't seem to be kind for 'string'. What I typed one character with single quotes was a (big) mistake. Thanks to remind me :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.