How to make a DataFrame visible in Python cell in Databricks notebook?

Question

I created Spark DataFrame in Scala using Databricks. After doing some preprocessing,I came up with a smaller data subset that fits into memory. Therefore I want to convert it to Pandas and then save as CSV file.

The problem is that the DataFrame df on which I worked in Databricks notebook in Scala cells is not visible in a Python cell.

%python

df.toPandas().to_csv("dbfs:/FileStore/tables/test.csv", header=True, index=False)

How can I make df visible in the Python cell?

Probably too good to be true, but: df_py = df.toPandas().to_csv("dbfs:/FileStore/tables/test.csv", header=True, index=False) And then print(df_py)? — Erfan
– Erfan, Commented Jun 20, 2019 at 22:25
@Erfan: It does not work. It says that df cannot be found: NameError: name 'df' is not defined. But df exists in the above cell that I executed successfully before. — Fluxy
– Fluxy, Commented Jun 20, 2019 at 22:26
You don't need to export to csv actually, just do: df_py = df.toPandas() Then print(df_py) — Erfan
– Erfan, Commented Jun 20, 2019 at 22:34
@Erfan: This should be Python cell, right? If so, the thing is that df is not visible in Python cell. — Fluxy
– Fluxy, Commented Jun 20, 2019 at 22:52

Harsha TJ · Accepted Answer · 2019-06-20 22:52:31Z

2

Do this display(df) . It usually displays some nested Structs as well.

Or I would do something like this df.createOrReplaceTempView("dfViewName") In the next cell %sql

Select * from dfViewName

answered Jun 20, 2019 at 22:52

Harsha TJ

2721 silver badge9 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Fluxy Over a year ago

display(df) is exactly what I need. Regarding sql, I think that it would be useful if I wanted to use SQL in the next cell, but I wanted to use Python. Since my final goal was just to save CSV file, display is the right solution.

Fluxy Over a year ago

By the way, which approach would I use to save DataFrame to make it accessible in another Databricks Notebook on the same cluster?

Fluxy Over a year ago

@Erfan: I wanted pandas for saving the DataFrame as CSV file. Sorry, if it was unclear. Of course, I apprecate to see a solution with Pandas. But if it's impossible, then "display" would be a workaround for me.

Collectives™ on Stack Overflow

How to make a DataFrame visible in Python cell in Databricks notebook?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related