0

I am encountering an error while trying to connect Spark to Elasticsearch and insert a DataFrame. The specific error message is as follows:

Py4JJavaError: An error occurred while calling o130.save. : org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

Here is the relevant code snippet:

from pyspark.sql import SparkSession
from pyspark.sql.functions import *

appName = "mysql example"
master = "local"
spark = SparkSession.builder.master(master).appName(appName)\
        .config("spark.jars", "postgresql-42.2.6.jar,elasticsearch-spark-30_2.12-8.8.1.jar").getOrCreate()

category_product_df = dataframe_from_table("category")

category_product_df.write.format("org.elasticsearch.spark.sql") \
        .option("es.resource", "wael/test") \
        .option("es.port", "9200") \
        .option("es.nodes", "elastic:changeme@localhost") \
        .option("es.nodes.wan.only", "true") \
        .save()

When attempting to connect to Elasticsearch, I encountered the aforementioned error.

Here are some details about my environment:

  • PySpark version: 3.4.0

  • Scala version: 2.12.17

  • Elasticsearch version: 7.15.0

  • Elasticsearch-Spark connector version: elasticsearch-spark-30_2.12-8.8.1.jar

I have verified the accessibility of the Elasticsearch cluster and made sure that the necessary network connectivity is in place. Additionally, I have checked the Elasticsearch cluster configuration and confirmed that the host, port, and authentication credentials are correct.

I have also noticed that the error suggests setting the 'es.nodes.wan.only' property to 'true' when targeting a WAN/Cloud instance of Elasticsearch. As such, I have included this configuration option in the code snippet.

Despite these efforts, I am still encountering the error. I would appreciate any insights or suggestions on how to resolve this issue and successfully connect Spark to Elasticsearch.

2
  • Are you sure that the port and host have been defined correctly? I encountered the same error message with incorrect names. Commented Jun 28, 2023 at 19:16
  • yes absolutely i can access elasticsearch from navigator using localhost:9200 Commented Jun 29, 2023 at 8:03

1 Answer 1

1

i fixed it by running this instead :

df \
    .write \
    .format("es") \ 
    .option("es.nodes.wan.only", "true") \
    .option("es.net.http.auth.user", "elastic") \
    .option("es.net.http.auth.pass", "...") \
    .option("es.nodes", "localhost") \
    .option("es.port", "9200") \
    .option('es.resource',"spark/test") \
    .save()

i guess the problem was format("org.elasticsearch.spark.sql")not sure about it tho

Sign up to request clarification or add additional context in comments.

1 Comment

Congrats on your first comment buddy. Glad it was the answer. Work on formatting it better

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.