0

I would like to know how can I do in the same program to generate random data using apache Kafka and receive it using spark streaming.

Let's show a use case:

I want to generate random data like this -> (A, B, [email protected]) while X seconds. And then I want to receive this data for processing it in real time (while I'm receiving it), and if the second parameter is B send an email to '[email protected]' with the following message: "The first parameter is A".

I know that I have to start a zookeeper server, then start a kafka broker, then create a topic, and then a producer for produce and send this data. For create the connection between kafka and streaming I need to use "createStream" function. But I don't know how to use a producer to send this data and then receive it with spark streaming for processing it. All this in the same program and using Java.

Any help? Thank you.

2
  • google for "kafka producer java example". Then let us know if you have some specific problems. Commented Apr 26, 2016 at 8:50
  • I will write you the same that I said to Matthias J. Sax. Now I have a producer program for generating that data link, inside I added the message (A, B, [email protected]). I have the spark program here link, and inside I want to read the data and process it sending the email if the second parameter is a B. I'm not very familiar with this,but I'm trying it. Now for testing this, I have to start kafka (including zk), and I should need one file more (main class) which starts producer program for write into kafka right? For spark I only have to submit the program right? Thank you! Commented Apr 26, 2016 at 10:56

1 Answer 1

1

There will not be a single program, but a Kafka producer program and a Spark program. For both, there are couple of examples available online, eg:

To run this, you start Kafka (including ZK) and your Spark cluster. Afterwards, you start your Producer program that writes into Kafka and you Spark job that reads from Kafka (I guess the order to start Producer and Spark job should not matter).

Sign up to request clarification or add additional context in comments.

7 Comments

Okay thanks. Now I have a producer program for generating that data link, inside I added the message (A, B, [email protected]). I have the spark program here link, and inside I want to read the data and process it sending the email if the second parameter is a B. I'm not very familiar with this,but I'm trying it. Now for testing this, I have to start kafka (including zk), and I should need one file more (main class) which starts producer program for write into kafka right? For spark I only have to submit the program right? Thank you.
I did not look at your code, but what you write sounds right.
Okay I will try that. The only thing that I need now is to know how to process the data received throught streaming for sending the email, I do not find examples which does a similar thing.
Not understand your question... Can you rephrase it?
Once I receive data from Kafka and Spark Jobs read from it, I need to process that data. I will receive messages like that: (A, B, [email protected]) which are a string. How can I process every message from kafka for sending an email to the 3 parameter if the second parameter is a 'B'. Do you understand me?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.