I want to filter out alphanumeric and numeric words from my file. I'm working on Spark-Shell. These are the contents of my file sparktest.txt:
This is 1 file not 54783. Would you l1ke this file to be Writt3n to HDFS?
Defining the file for collection:
scala> val myLines = sc.textFile("sparktest.txt")
Saving the line into an Array with words of length greater than 2:
scala> val myWords = myLines.flatMap(x => x.split("\\W+")).filter(x => x.length >2)
Defining a regular expression to use. I only want string that match "[A-Za-z]+":
scala> val regexpr = "[A-Za-z]+".r
Attempting to filter out the alphanumeric and numeric strings:
scala> val myOnlyWords = myWords.map(x => x).filter(x => regexpr(x).matches)
<console>:27: error: scala.util.matching.Regex does not take parameters
val myOnlyWords = myWords.map(x => x).filter(x => regexpr(x).matches)
This is where I'm stuck. I want the result to look like this:
Array[String] = Array(This, file, not, Would, you, this, file, HDFS)