Saving data in Elasticsearch using PySpark [duplicate]

Question

I have a program that takes a dataframe and should save it into Elasticsearch. Here's what it looks like when I save the dataframe:

    model_df.write.format(
        "org.elasticsearch.spark.sql"
    ).option(
        "pushdown", True
    ).option(
        "es.nodes", "example.server:9200"
    ).option("es.index.auto.create", True
    ).mode('append').save("EPTestIndex/")

When I run my program, I get this error:

py4j.protocol.Py4JJavaError: An error occurred while calling o96.save. : java.lang.ClassNotFoundException: Failed to find data source: org.elasticsearch.spark.sql. Please find packages at http://spark.apache.org/third-party-projects.html

I did some research and thought I needed a jar, so I added these configurations to my SparkSession:

spark = SparkSession.builder.config("jars", "/Users/public/ProjectDirectory/lib/elasticsearch-spark-20_2.11-6.0.1.jar")\
    .getOrCreate()
sqlContext = SQLContext(spark)

I initialize the SparkSession in main and write to ES in another package. The package takes the dataframe and runs the write command above. However, even with this I am still getting the same ClassNotFoundExceptioin What might be the issue?

I am running this program in PyCharm, how can I make it so that PyCharm is able to run it?

Viseshini Reddy · Accepted Answer · 2019-02-08 10:54:06Z

-1

Elasticsearch exposes a JSON API and a pandas dataframe is not a JSON supported type.

If you had to insert it, you could serialize the dataframe using dataframe.to_json()

answered Feb 8, 2019 at 10:54

Viseshini Reddy

8294 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Saving data in Elasticsearch using PySpark [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related