I'm writing data (approx. 83M records) from a dataframe into postgresql and it's kind of slow. Takes 2.7hrs to complete writing to db.
Looking at the executors, there is only one active task running on just one executor. Is there any way I could parallelize the writes into db using all executors in Spark?
...
val prop = new Properties()
prop.setProperty("user", DB_USER)
prop.setProperty("password", DB_PASSWORD)
prop.setProperty("driver", "org.postgresql.Driver")
salesReportsDf.write
.mode(SaveMode.Append)
.jdbc(s"jdbc:postgresql://$DB_HOST:$DB_PORT/$DATABASE", REPORTS_TABLE, prop)
Thanks