Databricks spark dataframe create dataframe by each column

Question

in pandas , I can do something like this .

data = {"col1" : [np.random.randint(10) for x in range(1,10)],
    "col2" : [np.random.randint(100) for x in range(1,10)]}
mypd = pd.DataFrame(data)
mypd

and give the two columns

are there any similar way to create a spark dataframe in pyspark ?

Does this answer your question? Manually create a pyspark dataframe — Steven
– Steven, Commented Nov 2, 2021 at 8:51

Steven · Accepted Answer · 2021-11-02 10:52:12Z

0

The answer shared by Steven is brilliant

Additionally if you are comfortable with Pandas

You can directly supply your pandas dataframe to the function createDataFrame

Spark >= 2.x

data = {
    "col1": [np.random.randint(10) for x in range(1, 10)],
    "col2": [np.random.randint(100) for x in range(1, 10)],
}
mypd = pd.DataFrame(data)

sparkDF = sql.createDataFrame(mypd)

sparkDF.show()

+----+----+
|col1|col2|
+----+----+
|   6|   4|
|   1|  39|
|   7|   4|
|   7|  95|
|   6|   3|
|   7|  28|
|   2|  26|
|   0|   4|
|   4|  32|
+----+----+

edited Nov 2, 2021 at 10:52

Steven

15.4k7 gold badges49 silver badges80 bronze badges

answered Nov 2, 2021 at 9:02

Vaebhav

5,0921 gold badge17 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Vaebhav Over a year ago

Thanks for the info Steven , updated the answer accordingly

mytabi Over a year ago

so must need to through pandas first?

Vaebhav Over a year ago

If you are more comfortable with pandas , else you can use the link shared by Steven to directly create a Spark DataFrame bypassing pandas

Collectives™ on Stack Overflow

Databricks spark dataframe create dataframe by each column

1 Answer 1

Spark >= 2.x

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Spark >= 2.x

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related