Split Spark Dataframe to each row and convert to JSON - Python

Question

I am a newbie to Spark and am trying to read & research as much as I can. Currently I am stuck on this and I have spent few days for solving. I have successfully set up a Spark Clusters on 3 machines (1 master, 2 slaves) and run some examples. Now I am trying to write a Python application which it will reads the csv file and then split each row in a JSON file and upload all of them to S3. Here are my problems:

I have converted the csv to Spark DataFrame, using SparkSession.read.csv(), how do I split this DataFrame into multiple rows and convert to JSON? I have read that Spark DataFrame has the toJSON function but that applied to whole DataFrame, so how can I use thi function on each row of DataFrame instead of the whole one?
How can I applied distributed system in my application, giving that I have 2 slaves and one master? Or does my application automatically split the work into smaller parts and assign to the slaves?
How can I put the converted JSON to S3, some sample code guidance would help me best.

I would be really appreciated if you could help me, thanks for your help in advance.

Ram Pedapatnam · Accepted Answer · 2016-12-05 05:46:19Z

1

To read json files, you can use sqlContext.jsonFile(). The you can use regular SQL queries for processing. You can see here from more information
The spark works on partitions. Your data would be divided into partitions and run on executors. That would be taken by spark based on the mode you are using. Not sure if you are using YARN.
In python, you can use boto3 to make the data saved to amazon s3. Its a very easy to use package. Look here

answered Dec 5, 2016 at 5:46

Ram Pedapatnam

7,1486 gold badges20 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Leo Over a year ago

All of your points are correct and helped me a lot in finding answer. Thank you.

Collectives™ on Stack Overflow

Split Spark Dataframe to each row and convert to JSON - Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related