Py4JJavaError: Cannot detect ES version when connecting PYSpark to Elasticsearch

Question

I am encountering an error while trying to connect Spark to Elasticsearch and insert a DataFrame. The specific error message is as follows:

Py4JJavaError: An error occurred while calling o130.save. : org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

Here is the relevant code snippet:

from pyspark.sql import SparkSession
from pyspark.sql.functions import *

appName = "mysql example"
master = "local"
spark = SparkSession.builder.master(master).appName(appName)\
        .config("spark.jars", "postgresql-42.2.6.jar,elasticsearch-spark-30_2.12-8.8.1.jar").getOrCreate()

category_product_df = dataframe_from_table("category")

category_product_df.write.format("org.elasticsearch.spark.sql") \
        .option("es.resource", "wael/test") \
        .option("es.port", "9200") \
        .option("es.nodes", "elastic:changeme@localhost") \
        .option("es.nodes.wan.only", "true") \
        .save()

When attempting to connect to Elasticsearch, I encountered the aforementioned error.

Here are some details about my environment:

PySpark version: 3.4.0
Scala version: 2.12.17
Elasticsearch version: 7.15.0
Elasticsearch-Spark connector version: elasticsearch-spark-30_2.12-8.8.1.jar

I have verified the accessibility of the Elasticsearch cluster and made sure that the necessary network connectivity is in place. Additionally, I have checked the Elasticsearch cluster configuration and confirmed that the host, port, and authentication credentials are correct.

I have also noticed that the error suggests setting the 'es.nodes.wan.only' property to 'true' when targeting a WAN/Cloud instance of Elasticsearch. As such, I have included this configuration option in the code snippet.

Despite these efforts, I am still encountering the error. I would appreciate any insights or suggestions on how to resolve this issue and successfully connect Spark to Elasticsearch.

Are you sure that the port and host have been defined correctly? I encountered the same error message with incorrect names. — Bibzon
– Bibzon, Commented Jun 28, 2023 at 19:16
yes absolutely i can access elasticsearch from navigator using localhost:9200 — Flàyn
– Flàyn, Commented Jun 29, 2023 at 8:03

Flàyn · Accepted Answer · 2023-06-30 11:26:01Z

1

i fixed it by running this instead :

df \
    .write \
    .format("es") \ 
    .option("es.nodes.wan.only", "true") \
    .option("es.net.http.auth.user", "elastic") \
    .option("es.net.http.auth.pass", "...") \
    .option("es.nodes", "localhost") \
    .option("es.port", "9200") \
    .option('es.resource',"spark/test") \
    .save()

i guess the problem was format("org.elasticsearch.spark.sql")not sure about it tho

answered Jun 30, 2023 at 11:26

Flàyn

731 silver badge7 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

alvonellos Over a year ago

Congrats on your first comment buddy. Glad it was the answer. Work on formatting it better

Collectives™ on Stack Overflow

Py4JJavaError: Cannot detect ES version when connecting PYSpark to Elasticsearch

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related