0

I'm new to Scala and I cannot find out what is causing this error, I have searched similar topics but unfortunately, none of them worked for me. I've got a simple code to find the line from some README.md file with the most words in it. The code I wrote is:

    val readme = sc.textFile("/PATH/TO/README.md")
    readme.map(lambda line :len(line.split())).reduce(lambda a, b: a if (a > b) else b)

and the error is:

    Name: Compile Error
    Message: <console>:1: error: ')' expected but '(' found.
    readme.map(lambda line :len(line.split()) ).reduce( lambda a, b: a                 
    if (a > b) else b )        ^

    <console>:1: error: ';' expected but ')' found.
    readme.map(lambda line :len(line.split()) ).reduce( lambda a, b: a 
    if (a > b) else b )                       ^
3
  • 3
    It doesn't work because it has nothing in common with the Scala syntax Commented Nov 28, 2017 at 13:54
  • To be honest this question looks trollish. Commented Nov 28, 2017 at 14:00
  • Not every language is Python. There are different languages and Scala is one of those. If you want to use Scala - learn Scala syntax. Commented Nov 28, 2017 at 20:52

1 Answer 1

4

Your code isn't valid Scala.

I think what you might be trying to do is to determine the largest number of words on a single line in a README file using Spark. Is that right? If so, then you likely want something like this:

val readme = sc.textFile("/PATH/TO/README.md")
readme.map(_.split(' ').length).reduce(Math.max)

That last line uses some argument abbreviations. This alternative version is equivalent, but a little more explicit:

readme.map(line => line.split(' ').length).reduce((a, b) => Math.max(a, b))

The map function converts an RDD of Strings (each line in the file) into an RDD of Ints (the number of words on a single line, delimited - in this particular case - by spaces). The reduce function then returns the largest value of its two arguments - which will ultimately result in a single Int value representing the largest number of elements on a single line of the file.

After re-reading your question, it seems that you might want to know the line with the most words, rather than how many words are present. That's a little trickier, but this should do the trick:

readme.map(line => (line.split(' ').length, line)).reduce((a, b) => if(a._1 > b._1) a else b)._2

Now map creates an RDD of a tuple of (Int, String), where the first value is the number of words on the line, and the second is the line itself. reduce then retains whichever of its two tuple arguments has the larger integer value (._1 refers to the first element of the tuple). Since the result is a tuple, we then use ._2 to retrieve the corresponding line (the second element of the tuple).

I'd recommend you read a good book on Scala, such as Programming in Scala, 3rd Edition, by Odersky, Spoon & Venners. There's also some tutorials and an overview of the language on the main Scala language site. Coursera also has some free Scala training courses that you might want to sign up for.

Sign up to request clarification or add additional context in comments.

1 Comment

@KarlBielefeldt or readme.map(_.split(' ').length).max which uses a lot less memory/storage. I was just relating my answer to his question. ;-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.