Not able to parse a file using Java Spark API

Question

I have a log file with entries like this

10.28 INFO  [EFKLogger] - POGUpdateTenestenerServiceImpl: Entering listener with object 624866045533

Now using Spark i want to count the number of queues getting hit every hour . Queue is POGUpdateTenestenerServiceImpl . Now i want a JAVARDD that only contains the time and the queue so i can perform operation on it . I am new top spark and only found ways to either create RDD with all words or as a whole line . I only want two words from a line . HOw can i achieve this

antonpuz · Accepted Answer · 2016-09-06 07:09:03Z

1

You should use the textFile function of the SparkContext to read the file:

Here is a Scala example, it can be translated easily to java

val text = sc.textFile("data.csv") //Read the file
val words = text.map(line=> line.split(" ")) //Break the line to words

Now words is an array of words, you can take the first second and do whatever you want with them.

answered Sep 6, 2016 at 7:09

antonpuz

3,3564 gold badges32 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Evan Root Over a year ago

Thanks i did it anyway using the map function . Your solution works as well

Collectives™ on Stack Overflow

Not able to parse a file using Java Spark API

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related