As I'm new to spark i have a simple doubt
i have to create an empty dataframe which I have to populate based on some conditions later on.
I have gone through many questions of creating an empty dataframe but what is difference between these below approach
what i have approached I don't know whether it's right approach or not
def function1(df: DataFrame): DataFrame = {
var newdf:DataFrame= null;
if(!x._2(0).column.trim.isEmpty){
newdf= spark.sql("SELECT f_name,l_name FROM tab1");
}else{
newdf= spark.sql("SELECT address,zipcode FROM tab1");
}
newdf
}
The above approach is not giving me any error in while running in local don't know when it comes to cluster.
But I have found other approaches where they have created an empty dataframe with specified schema like this:
val my_schema = StructType(Seq(
StructField("field1", StringType, nullable = false),
StructField("field2", StringType, nullable = false)
))
val empty: DataFrame = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], my_schema)
But my problem is I don't have a predefined schema and the resulting dataframe may be of any schema which is something related to runtime where i don't know how the schema will look like.
Is there any problem if I go with approach 1 or anything i'm missing.
Map[String,String]which I'm iterating based on the value i.e if the value is empty then if condition will execute or else will execute. Similarly my map may contain n no. of k and v and i want the last resulting dataframe.val df = spark.emptyDataFrame.. will create empty dataframe without specifying schema