I'm learning databricks with a friend and one thing I really do not get it.
I'm trying to query using pyspark and spark sql in a json file that is in a storage account in azure.
The path of the file in azure is this: 'abfss://[email protected]/raw_files/'
In databricks i've created the following statement to create the dataframe:
df = spark.read.format("json").load("abfss://[email protected]/raw_files/")
Ok until there, but:
Knowing that I've created a dataframe, why I can't query it using pyspark ou spark sql?
If I use this statement, just to exemplify:
SELECT * FROM df
It will not work.
However, when i do this it will:
df = spark.read.format("json").load("abfss://[email protected]/raw_files/")
df.createOrReplaceTempView('df_view')**
SELECT * FROM df_view**;
He said that this occurs because pyspark and spark sql are API(my doubt lies here).
Why this happen? and what are the other ways beside createOrReplaceTempView?
Could someone give some advice?