1

I need to split string into the array with elements as two following words by scala:

"Hello, it is useless text. Hope you can help me."

The result:

[[it is], [is useless], [useless text], [Hope you], [you can], [can help], [help me]]

One more example:

"This is example 2. Just\nskip it."

Result: [[This is], [is example], [Just skip], [skip it]]

I tried this regex:

var num = """[a-zA-Z]+\s[a-zA-Z]+""".r

But the output is:

scala> for (m <- re.findAllIn("Hello, it is useless text. Hope you can help me.")) println(m)
it is
useless text
Hope you
can help

So it ignores some cases.

5
  • 2
    ignore the regex break the code on SPACE and use a for loop to join them. Taught a comment would be quicker thats why Commented Apr 28, 2018 at 18:42
  • Capital letters are preserved or adjusted to lowercase? Commented Apr 28, 2018 at 18:53
  • @jwvh preserved. It's my mistake. I should edit it Commented Apr 28, 2018 at 18:56
  • Try """(?=\b(\w+\s+\w+)\b)""".r.findAllMatchIn(s.replaceAll("""(?U)\W+""", " ")).map(_.group(1)) Commented Apr 28, 2018 at 19:21
  • What should happen for "I don't like Mike's over-use of apostrophes! (Why is that?)" (i.e. embedded non-letter, parems, trailing question marks etc) Commented Apr 29, 2018 at 8:42

4 Answers 4

1

First split on the punctuation and digits, then split on the spaces, then slide over the results.

def doubleUp(txt :String) :Array[Array[String]] =
  txt.split("[.,;:\\d]+")
     .flatMap(_.trim.split("\\s+").sliding(2))
     .filter(_.length > 1)

usage:

val txt1 = "Hello, it is useless text. Hope you can help me."
doubleUp(txt1)
//res0: Array[Array[String]] = Array(Array(it, is), Array(is, useless), Array(useless, text), Array(Hope, you), Array(you, can), Array(can, help), Array(help, me))

val txt2 = "This is example 2. Just\nskip it."
doubleUp(txt2)
//res1: Array[Array[String]] = Array(Array(This, is), Array(is, example), Array(Just, skip), Array(skip, it))
Sign up to request clarification or add additional context in comments.

1 Comment

Works great. Thank you!
1

First process the string as it is by removing all escape characters.

scala> val string = "Hello, it is useless text. Hope you can help me."
val preprocessed = StringContext.processEscapes(string)
//preprocessed: String = Hello, it is useless text. Hope you can help me.

OR

scala>val string = "This is example 2. Just\nskip it."
val preprocessed = StringContext.processEscapes(string)
//preprocessed: String =
//This is example 2. Just
//skip it.

Then filter out all necessary chars(like chars, space etc...) and use slide function as

val result = preprocessed.split("\\s").filter(e => !e.isEmpty && !e.matches("(?<=^|\\s)[A-Za-z]+\\p{Punct}(?=\\s|$)") ).sliding(2).toList

//scala> res9: List[Array[String]] = List(Array(it, is), Array(is, useless), Array(useless, Hope), Array(Hope, you), Array(you, can), Array(can, help))

2 Comments

"Hello, it" and "text.Hope" are separated by punctuation marks and should not be in resulting list
Updated the filter function.
0

You need to use split to break the string down into words separated by non-word characters, and then sliding to double-up the words in the way that you want;

val text = "Hello, it is useless text. Hope you can help me."

text.trim.split("\\W+").sliding(2)

You may also want to remove escape characters, as explained in other answers.

Comments

-1

Sorry I only know Python. I heard the two are almost the same. Hope you can understand

string = "it is useless text. Hope you can help me."

split = string.split(' ')  // splits on space (you can use regex for this)

result = []

no = 0

count = len(split)

for x in range(count):
    no +=1

    if no < count:

        pair = split[x] + ' ' + split[no]   // Adds the current to the next

        result.append(pair)

The output will be:

['it is', 'is useless', 'useless text.', 'text. Hope', 'Hope you', 'you can', 'can help', 'help me.']

5 Comments

I'm sorry. But splitting by space is not the answer. Because punctuation marks are in lists. However, I understand the idea of your answer
The comma's in list are useless in real life. You can't make use of it whatsoever. Anyway like you said I hope you get the part of the for loop/
Please don't answer a question about Scala with some code in Python. They are very different languages.
ok. please. I taught they I heard somewhere they can be used alternatively. That's why. Just wanted the point of the for loop to be explained well.
@surge10 Your for loop is not appropriate for Scala, so your answer is misleading. Please don't answer questions about languages that you do not understand.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.