0

in pandas , I can do something like this .

data = {"col1" : [np.random.randint(10) for x in range(1,10)],
    "col2" : [np.random.randint(100) for x in range(1,10)]}
mypd = pd.DataFrame(data)
mypd

and give the two columns

enter image description here

are there any similar way to create a spark dataframe in pyspark ?

1

1 Answer 1

0

The answer shared by Steven is brilliant

Additionally if you are comfortable with Pandas

You can directly supply your pandas dataframe to the function createDataFrame

Spark >= 2.x

data = {
    "col1": [np.random.randint(10) for x in range(1, 10)],
    "col2": [np.random.randint(100) for x in range(1, 10)],
}
mypd = pd.DataFrame(data)

sparkDF = sql.createDataFrame(mypd)

sparkDF.show()

+----+----+
|col1|col2|
+----+----+
|   6|   4|
|   1|  39|
|   7|   4|
|   7|  95|
|   6|   3|
|   7|  28|
|   2|  26|
|   0|   4|
|   4|  32|
+----+----+
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the info Steven , updated the answer accordingly
so must need to through pandas first?
If you are more comfortable with pandas , else you can use the link shared by Steven to directly create a Spark DataFrame bypassing pandas

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.