0

The code works fine with the pyspark shell but when I'm trying to write a program in Java or Scala, I'm getting exceptions.

What is the best way to store spark dataframe to MongoDB using python?

  • pyspark version- 2.2.0
  • MongoDB version- 3.4
  • Python 2.7
  • Java - jdk-9

Here is my code:

from pyspark import SparkContext
from pyspark.sql import SparkSession

my_spark = SparkSession \
    .builder \
    .appName("myApp") \
    .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.coll") \
    .config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.coll") \
    .getOrCreate()

dataframe = my_spark.read.csv('auto-data.csv', header=True)
dataframe.write.format("com.mongodb.spark.sql.DefaultSource") \
    .mode("append").option("database", "auto").option("collection", "autod").save()

and the snapshot of my csv data.

and the errors:

I tried after installing mongo-spark library from github, yet getting the same result.

7
  • You need to provide the required jars packages using --jars options while submitting the script. Error clearly points that , it is not able to find the required class. Commented Sep 29, 2017 at 2:48
  • I pretty much tried that. I also put the mongo-spark which contains the jar files. But still I couldn't solve this issue. Commented Sep 29, 2017 at 3:18
  • Post the full command you are using to run the script. It might be helpful in learning what you are missing. Commented Sep 29, 2017 at 3:38
  • Just a wild thought. Can you fallback to JDK8. I don't think Spark is compatible yet with JDK9. Then, try again and see if you get the same errors. Commented Sep 29, 2017 at 4:11
  • Use Java 8 or below. Commented Sep 29, 2017 at 6:32

1 Answer 1

0

You need to download all the dependencies and store at a location , "/opt/jars" in the following example Jars required 1. mongo-spark-connector_2.12-2.4.0.jar 2. mongodb-driver-3.10.1.jar 3. mongo-hadoop-core-1.3.0.jar (Incase running spark on yarn)

sudo wget https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.12/2.4.0/mongo-spark-connector_2.12-2.4.0.jar
sudo wget https://repo1.maven.org/maven2/org/mongodb/mongodb-driver/3.10.1/mongodb-driver-3.10.1.jar
sudo wget https://repo1.maven.org/maven2/org/mongodb/mongo-hadoop-core/1.3.0/mongo-hadoop-core-1.3.0.jar

Then execute with the following command

spark-submit --jars "/opt/jar/*.jar" --packages org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 <your file>.py arg1 arg2 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.