0

I am trying to read a csv file into a dataframe in Spark as follow:

  1. I run spark shell like:

    spark-shell --jars .\spark-csv_2.11-1.4.0.jar;.\commons-csv-1.2.jar (I cannot directly download those dependency that's why I am using --jars)

  2. Use the following command to read a csv file:

val df_1 = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("2008.csv")

But, here is the error message that I get:

scala> val df_1 = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("2008.csv")
java.lang.NoClassDefFoundError: org/apache/commons/csv/CSVFormat
        at com.databricks.spark.csv.package$.<init>(package.scala:27)
        at com.databricks.spark.csv.package$.<clinit>(package.scala)
        at com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:235)
        at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:73)
        at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:162)
        at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:44)
        at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
        at $iwC$$iwC$$iwC.<init>(<console>:47)
        at $iwC$$iwC.<init>(<console>:49)
        at $iwC.<init>(<console>:51)
        at <init>(<console>:53)
        at .<init>(<console>:57)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop
.scala:997)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:
945)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:
945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.csv.CSVFormat
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 57 more

After doing the first proposed solution:

PS C:\Users\319413696\Desktop\graphX> spark-shell --packages com.databricks:spark-csv_2.11:1.4.0
Ivy Default Cache set to: C:\Users\319413696\.ivy2\cache
The jars for the packages stored in: C:\Users\319413696\.ivy2\jars
:: loading settings :: url = jar:file:/C:/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/apache
/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
        found com.databricks#spark-csv_2.11;1.4.0 in local-m2-cache
        found org.apache.commons#commons-csv;1.1 in local-m2-cache
        found com.univocity#univocity-parsers;1.5.1 in local-m2-cache
downloading file:/C:/Users/319413696/.m2/repository/com/databricks/spark-csv_2.11/1.4.0/spark-csv_2.11-1.4.0.jar ...
        [SUCCESSFUL ] com.databricks#spark-csv_2.11;1.4.0!spark-csv_2.11.jar (0ms)
downloading file:/C:/Users/319413696/.m2/repository/org/apache/commons/commons-csv/1.1/commons-csv-1.1.jar ...
        [SUCCESSFUL ] org.apache.commons#commons-csv;1.1!commons-csv.jar (0ms)
downloading file:/C:/Users/319413696/.m2/repository/com/univocity/univocity-parsers/1.5.1/univocity-parsers-1.5.1.jar ..
.
        [SUCCESSFUL ] com.univocity#univocity-parsers;1.5.1!univocity-parsers.jar (15ms)
:: resolution report :: resolve 671ms :: artifacts dl 31ms
        :: modules in use:
        com.databricks#spark-csv_2.11;1.4.0 from local-m2-cache in [default]
        com.univocity#univocity-parsers;1.5.1 from local-m2-cache in [default]
        org.apache.commons#commons-csv;1.1 from local-m2-cache in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   3   |   3   |   3   |   0   ||   3   |   3   |
        ---------------------------------------------------------------------

:: problems summary ::
:::: ERRORS
        Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.11/1.4.0/spark-csv_2.11-1.4
.0-sources.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.11/1.4.0/spark-
csv_2.11-1.4.0-sources.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.11/1.4.0/spark-csv_2.11-1.4
.0-src.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.11/1.4.0/spark-
csv_2.11-1.4.0-src.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.11/1.4.0/spark-csv_2.11-1.4
.0-javadoc.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url http://dl.bintray.com/spark-packages/maven/com/databricks/spark-csv_2.11/1.4.0/spark-
csv_2.11-1.4.0-javadoc.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url https://repo1.maven.org/maven2/org/apache/apache/15/apache-15.jar (java.net.SocketExc
eption: Permission denied: connect)

        Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/apache/15/apache-15.jar (java.n
et.SocketException: Permission denied: connect)

        Server access error at url https://repo1.maven.org/maven2/org/apache/commons/commons-parent/35/commons-parent-35
.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-parent/35/commo
ns-parent-35.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url https://repo1.maven.org/maven2/org/apache/commons/commons-csv/1.1/commons-csv-1.1-sou
rces.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-csv/1.1/commons
-csv-1.1-sources.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url https://repo1.maven.org/maven2/org/apache/commons/commons-csv/1.1/commons-csv-1.1-src
.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-csv/1.1/commons
-csv-1.1-src.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url https://repo1.maven.org/maven2/org/apache/commons/commons-csv/1.1/commons-csv-1.1-jav
adoc.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/commons/commons-csv/1.1/commons
-csv-1.1-javadoc.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url https://repo1.maven.org/maven2/com/univocity/univocity-parsers/1.5.1/univocity-parser
s-1.5.1-sources.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url http://dl.bintray.com/spark-packages/maven/com/univocity/univocity-parsers/1.5.1/univ
ocity-parsers-1.5.1-sources.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url https://repo1.maven.org/maven2/com/univocity/univocity-parsers/1.5.1/univocity-parser
s-1.5.1-src.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url http://dl.bintray.com/spark-packages/maven/com/univocity/univocity-parsers/1.5.1/univ
ocity-parsers-1.5.1-src.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url https://repo1.maven.org/maven2/com/univocity/univocity-parsers/1.5.1/univocity-parser
s-1.5.1-javadoc.jar (java.net.SocketException: Permission denied: connect)

        Server access error at url http://dl.bintray.com/spark-packages/maven/com/univocity/univocity-parsers/1.5.1/univ
ocity-parsers-1.5.1-javadoc.jar (java.net.SocketException: Permission denied: connect)

5 Answers 5

2
  1. Give the full path of jars and separate them with , instead of ;

    spark-shell --jars spark-shell --jars fullpath\spark-csv_2.11-1.4.0.jar,fullpath\commons-csv-1.2.jar

  2. Be sure that you have permissions in the folders (DFS) that tempory files will be written.

Sign up to request clarification or add additional context in comments.

Comments

0

Download spark-csv to your .m2 directiory, then use spark-shell --packages com.databricks:spark-csv_2.11:1.4.0

If you can't download spark-csv directly, download it in other system and the copy all .m2 directory to your computer.

Comments

0

Instead of using sqlContext.read, I used the following code to turn my .csv file into a dataframe. Suppose the .csv file has 5 columns as follow:

case class Flight(arrDelay: Int, depDelay: Int, origin: String, dest: String, distance: Int)

Then:

val flights=sc.textFile("2008.csv").map(_.split(",")).map(p => Flight(p(0).trim.toInt, p(1).trim.toInt, p(2)
, p(3), p(4).trim.toInt)).toDF()

Comments

0

Milad Khajavi saved the day for me. After days of battling with getting spark-csv to work on a cluster where there is no internet access, I finally took his idea, downloaded the package on a VM. Then I copied .ivy2 directory from the VM to other cluster. Now its working without any issues.

Download spark-csv to your .m2 directiory, then use spark-shell --packages com.databricks:spark-csv_2.11:1.4.0

If you can't download spark-csv directly, download it in other system and the copy all .m2 directory to your computer.

Comments

0

I also got same exception , which was resolved after downloading SPARK-CSV and commons-CSV jars. You have to download two jars SPARK-CSV and commons-csv with below commands

SPARK-CSV

scala> wget http://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.5.0/spark-csv_2.10-1.5.0.jar

commons-csv

scala> wget http://central.maven.org/maven2/org/apache/commons/commons-csv/1.1/commons-csv-1.1.jar

Copy these jars in /tmp directory . Now run spark-shell as below

spark-shell --jars /tmp/spark-csv_2.10-1.5.0.jar,/tmp/commons-csv-1.1.jar

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.