I created Spark DataFrame in Scala using Databricks. After doing some preprocessing,I came up with a smaller data subset that fits into memory. Therefore I want to convert it to Pandas and then save as CSV file.
The problem is that the DataFrame df on which I worked in Databricks notebook in Scala cells is not visible in a Python cell.
%python
df.toPandas().to_csv("dbfs:/FileStore/tables/test.csv", header=True, index=False)
How can I make df visible in the Python cell?
df_py = df.toPandas().to_csv("dbfs:/FileStore/tables/test.csv", header=True, index=False)And thenprint(df_py)?dfcannot be found:NameError: name 'df' is not defined. Butdfexists in the above cell that I executed successfully before.df_py = df.toPandas()Thenprint(df_py)dfis not visible in Python cell.df_pyin python cell