I am using spark 2.4.1 version and java 8.
I have scenario like:
- Will be provided a list of classifiers from a property file to process.
- These classifiers determines the data what to pull and process.
Something like the below:
val classifiers = Seq("classifierOne","classifierTwo","classifierThree");
for( classifier : classifiers ){
// read from CassandraDB table
val acutalData = spark.read(.....).where(<classifier conditition>)
// the data varies depend on the classifier passed in
// this data has many fields along with fieldOne, fieldTwo and fieldThree
Depend on the classifier I need to filter the data. Currently I am doing it as below:
if(classifier.===("classifierOne")) {
val classifierOneDs = acutalData.filter(col("classifierOne").notEqual(lit("")).or(col("classifierOne").isNotNull()));
writeToParquet(classifierOneDs);
} else if(classifier.===("classifierTwo")) {
val classifierTwoDs = acutalData.filter(col("classifierTwo").notEqual(lit("")).or(col("classifierTwo").isNotNull()));
writeToParquet(classifierOneDs);
} else (classifier.===("classifierThree")) {
val classifierThreeDs = acutalData.filter(col("classifierThree").notEqual(lit("")).or(col("classifierThree").isNotNull()));
writeToParquet(classifierOneDs);
}
Is there any way to avoid the if-else block here?
Any other way to do/achieve the same in spark distrubated way?
writeToParquet(classifierOneDs);- you should read about how to use Scala properly ...val df = if() else ...and afterwards to the write once. Also Scala pattern matching is more Scala style ...