0

I have a program that takes a dataframe and should save it into Elasticsearch. Here's what it looks like when I save the dataframe:

    model_df.write.format(
        "org.elasticsearch.spark.sql"
    ).option(
        "pushdown", True
    ).option(
        "es.nodes", "example.server:9200"
    ).option("es.index.auto.create", True
    ).mode('append').save("EPTestIndex/")

When I run my program, I get this error:

py4j.protocol.Py4JJavaError: An error occurred while calling o96.save. : java.lang.ClassNotFoundException: Failed to find data source: org.elasticsearch.spark.sql. Please find packages at http://spark.apache.org/third-party-projects.html

I did some research and thought I needed a jar, so I added these configurations to my SparkSession:

spark = SparkSession.builder.config("jars", "/Users/public/ProjectDirectory/lib/elasticsearch-spark-20_2.11-6.0.1.jar")\
    .getOrCreate()
sqlContext = SQLContext(spark)

I initialize the SparkSession in main and write to ES in another package. The package takes the dataframe and runs the write command above. However, even with this I am still getting the same ClassNotFoundExceptioin What might be the issue?

I am running this program in PyCharm, how can I make it so that PyCharm is able to run it?

0

1 Answer 1

-1

Elasticsearch exposes a JSON API and a pandas dataframe is not a JSON supported type.

If you had to insert it, you could serialize the dataframe using dataframe.to_json()

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.