0

I have a log file with entries like this

10.28 INFO  [EFKLogger] - POGUpdateTenestenerServiceImpl: Entering listener with object 624866045533

Now using Spark i want to count the number of queues getting hit every hour . Queue is POGUpdateTenestenerServiceImpl . Now i want a JAVARDD that only contains the time and the queue so i can perform operation on it . I am new top spark and only found ways to either create RDD with all words or as a whole line . I only want two words from a line . HOw can i achieve this

1 Answer 1

1

You should use the textFile function of the SparkContext to read the file:

Here is a Scala example, it can be translated easily to java

val text = sc.textFile("data.csv") //Read the file
val words = text.map(line=> line.split(" ")) //Break the line to words

Now words is an array of words, you can take the first second and do whatever you want with them.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks i did it anyway using the map function . Your solution works as well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.