3

I am new to Scala and HDFS:

I am just wondering I am able to read local file from Scala code but how to read from HDFS:

import scala.io.source
object ReadLine {
  def main(args:Array[String]) {
    if (args.length>0) {
      for (line <- Source.fromLine(args(0)).getLine())
        println(line)
      }
    }

in Argument I have passed hdfs://localhost:9000/usr/local/log_data/file1.. But its giving FileNotFoundException error I am definitely missing something.. can anyone help me out here ?

1 Answer 1

10

scala.io.source api cannot read from HDFS. Source is used to read from local file system.

Spark

If you want to read from hdfs then I would recommend to use spark where you would have to use sparkContext.

val lines = sc.textFile(args(0))  //args(0) should be hdfs:///usr/local/log_data/file1

No Spark

If you don't want to use spark then you should go with BufferedReader or StreamReader or hadoop filesystem api. for example

val hdfs = FileSystem.get(new URI("hdfs://yourUrl:port/"), new Configuration()) 
val path = new Path("/path/to/file/")
val stream = hdfs.open(path)
def readLines = Stream.cons(stream.readLine, Stream.continually( stream.readLine))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.