1

For example we have the string "abcdabcd"

And we want to count all the pairs (e.g: "ab" or "da") that are available in the string.

So how do we do that in apache spark?

I asked this cause it looks like that the RDD does not support sliding function:

rdd.sliding(2).toList
//Count number of pairs in list
//Returns syntax error on first line (sliding)

1 Answer 1

5

Apparently it supports sliding via mllib as shown by zero323 here

import org.apache.spark.mllib.rdd.RDDFunctions._

val str = "abcdabcd"

val rdd = sc.parallelize(str)

rdd.sliding(2).map(_.mkString).toLocalIterator.forEach(println)

will show

ab
bc
cd
da
ab
bc
cd

Sign up to request clarification or add additional context in comments.

2 Comments

and how do we count these pairs? Well btw you look like someone who is hunting my scala questions here :)
@lkn2993 using the classic word count approach in Apache Spark

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.