java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL

Question

I'm hitting very strange problem when trying to load JDBC DataFrame into Spark SQL.

I've tried several Spark clusters - YARN, standalone cluster and pseudo distributed mode on my laptop. It's reproducible on both Spark 1.3.0 and 1.3.1. The problem occurs in both spark-shell and when executing the code with spark-submit. I've tried MySQL & MS SQL JDBC drivers without success.

Consider following sample:

val driver = "com.mysql.jdbc.Driver"
val url = "jdbc:mysql://localhost:3306/test"

val t1 = {
  sqlContext.load("jdbc", Map(
    "url" -> url,
    "driver" -> driver,
    "dbtable" -> "t1",
    "partitionColumn" -> "id",
    "lowerBound" -> "0",
    "upperBound" -> "100",
    "numPartitions" -> "50"
  ))
}

So far so good, the schema gets resolved properly:

t1: org.apache.spark.sql.DataFrame = [id: int, name: string]

But when I evaluate DataFrame:

t1.take(1)

Following exception occurs:

15/04/29 01:56:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.1.42): java.sql.SQLException: No suitable driver found for jdbc:mysql://<hostname>:3306/test
    at java.sql.DriverManager.getConnection(DriverManager.java:689)
    at java.sql.DriverManager.getConnection(DriverManager.java:270)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:158)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:150)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:317)
    at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:309)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

When I try to open JDBC connection on executor:

import java.sql.DriverManager

sc.parallelize(0 until 2, 2).map { i =>
  Class.forName(driver)
  val conn = DriverManager.getConnection(url)
  conn.close()
  i
}.collect()

it works perfectly:

res1: Array[Int] = Array(0, 1)

When I run the same code on local Spark, it works perfectly too:

scala> t1.take(1)
...
res0: Array[org.apache.spark.sql.Row] = Array([1,one])

I'm using Spark pre-built with Hadoop 2.4 support.

The easiest way to reproduce the problem is to start Spark in pseudo distributed mode with start-all.sh script and run following command:

/path/to/spark-shell --master spark://<hostname>:7077 --jars /path/to/mysql-connector-java-5.1.35.jar --driver-class-path /path/to/mysql-connector-java-5.1.35.jar

Is there a way to work this around? It looks like a severe problem, so it's strange that googling doesn't help here.

Wildfire · Accepted Answer · 2015-05-01 08:13:26Z

4

Apparently this issue has been recently reported:

https://issues.apache.org/jira/browse/SPARK-6913

The problem is in java.sql.DriverManager that doesn't see the drivers loaded by ClassLoaders other than bootstrap ClassLoader.

As a temporary workaround it's possible to add required drivers to boot classpath of executors.

UPDATE: This pull request fixes the problem: https://github.com/apache/spark/pull/5782

UPDATE 2: The fix merged to Spark 1.4

edited May 1, 2015 at 8:13

answered Apr 29, 2015 at 10:55

Wildfire

6,4282 gold badges37 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

JKnight Over a year ago

Hey @Wildfire, how did you go about editing the boot classpath of the executors? I tried modifying the YARN container classpath to add the mysql-connector-java jar, but it did not fix the error. Thanks!

Wildfire Over a year ago

@JKnight I couldn't figure out how to add JDBC driver to boot classpath on YARN, so I'm just using patched Spark build.

Harish Pathak · Accepted Answer · 2015-10-13 14:07:47Z

3

For writing data to MySQL

In spark 1.4.0, you have to load MySQL before writing into it because it loads drivers on load function but not on write function. We have to put jar on every worker node and set the path in spark-defaults.conf file on each node. This issue has been fixed in spark 1.5.0

https://issues.apache.org/jira/browse/SPARK-10036

answered Oct 13, 2015 at 14:07

Harish Pathak

1,6071 gold badge20 silver badges32 bronze badges

Comments

Kevin Pauli · Accepted Answer · 2015-10-06 16:36:00Z

1

We are stuck on Spark 1.3 (Cloudera 5.4) and so I found this question and Wildfire's answer helpful since it allowed me to stop banging my head against the wall.

Thought I would share how we got the driver into the boot classpath: we simply copied it into /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hive/lib on all the nodes.

answered Oct 6, 2015 at 16:36

Kevin Pauli

8,96515 gold badges55 silver badges72 bronze badges

2 Comments

SantiArias Over a year ago

Hi, did you restart Cloudera after adding the the driver to that folder ? thanks

Kevin Pauli Over a year ago

@SantiArias yes I believe we did restart the cluster.

Ram Ghadiyaram · Accepted Answer · 2017-08-27 17:54:45Z

1

I am using spark-1.6.1 with SQL server, still faced the same issue. I had to add the library(sqljdbc-4.0.jar) to the lib in the instance and below line in conf/spark-dfault.conf file.

spark.driver.extraClassPath lib/sqljdbc-4.0.jar

edited Aug 27, 2017 at 17:54

Ram Ghadiyaram

29.4k16 gold badges101 silver badges133 bronze badges

answered Apr 27, 2016 at 20:26

user3466407

111 bronze badge

Collectives™ on Stack Overflow

java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL

4 Answers 4

2 Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related