0

I'm running PySpark with Elasticsearch back using the Elasticsearch-hadoop connector. I can read from a desired index using:

    es_read_conf = {
        "es.nodes": "127.0.0.1",
        "es.port": "9200",
        "es.resource": "myIndex_*/myType"
    }
    conf = SparkConf().setAppName("devproj")
    sc = SparkContext(conf=conf)

    es_rdd = sc.newAPIHadoopRDD(
        inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
        keyClass="org.apache.hadoop.io.NullWritable",
        valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
        conf=es_read_conf
    )

Works fine. I can wildcard the index.

How do I wildcard the document "type"? Or, how could I get more than one type, or even _all?

0

1 Answer 1

2

For all types you can use "es.resource": "myIndex_*".

For the wildcard part you would need a query:

     "prefix": {
        "_type": {
          "value": "test"
        }
      }
Sign up to request clarification or add additional context in comments.

1 Comment

okay, this worked. If I leave out the "type", it will select "all types"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.