I am using the Apache spark for loading the dataset from the Oracle Database using the jdbc format in Spark-Java.I am loading a dataset of the size 10M and 1M and performing various operations on the dataset.After all the processing is done,I am writing multiple datasets into DB,which consumes almost 90% of the time. So,how to reduce this write speed into the Oracle database. I tried various combinations of the executors-memory,cores,number of executors,but not a major difference obtained. Command used to run the apache spark built jar file is:
./spark-submit --class com.sample.Transformation --conf spark.sql.shuffle.partitions=5001 --num-executors=22 --executor-cores=8 --executor-memory=10GB --jars /scratch/rmbbuild/spark_ormb/drools-jars/ojdbc6.jar,/scratch/rmbbuild/spark_ormb/drools-jars/kie-api-7.7.0.Final.jar,/scratch/rmbbuild/spark_ormb/drools-jars/drools-core-7.7.0.Final.jar,/scratch/rmbbuild/spark_ormb/drools-jars/drools-compiler-7.7.0.Final.jar,/scratch/rmbbuild/spark_ormb/drools-jars/kie-soup-maven-support-7.7.0.Final.jar,/scratch/rmbbuild/spark_ormb/drools-jars/kie-internal-7.7.0.Final.jar,/scratch/rmbbuild/spark_ormb/drools-jars/xstream-1.4.10.jar,/scratch/rmbbuild/spark_ormb/drools-jars/kie-soup-commons-7.7.0.Final.jar,/scratch/rmbbuild/spark_ormb/drools-jars/ecj-4.4.2.jar,/scratch/rmbbuild/spark_ormb/drools-jars/mvel2-2.4.0.Final.jar,/scratch/rmbbuild/spark_ormb/drools-jars/kie-soup-project-datamodel-commons-7.7.0.Final.jar,/scratch/rmbbuild/spark_ormb/drools-jars/kie-soup-project-datamodel-api-7.7.0.Final.jar --driver-class-path /scratch/rmbbuild/spark_ormb/drools-jars/ojdbc6.jar --master spark://10.180.182.218:7077 "/scratch/rmbbuild/spark_ormb/POC-jar/Transformation-0.0.1-SNAPSHOT.jar" > /scratch/rmbbuild/spark_ormb/POC-jar/logs/logs10.txt
Write methods in Transformation.java file is as follows:
txndetailDS.write().mode("append").format("jdbc").option("url", connection).option("dbtable", CI_TXN_DETAIL).save();
finalpriceItemParamsGrpDS.distinct().write().mode("append").format("jdbc").option("url", connection).option("dbtable", CI_PRICEITEM_PARM_GRP_K).save();
finalpriceItemParamsGroupTable.distinct().write().mode("append").format("jdbc").option("url", connection).option("dbtable", CI_PRICEITEM_PARM_GRP).save();
finalPritmOutput.distinct().write().mode("append").format("jdbc").option("url", connection).option("dbtable", CI_TXN_DTL_PRITM).save();
outputDataSetforsummary.write().mode("append").format("jdbc").option("url", connection).option("dbtable", CI_TXN_DTL_PRITM_SUMMARY).save();