Elasticsearch pattern regex start with

Question

I would like to ask if exists some documentation which describe how to work with Elasticseach pattern regex.

I need to write Pattern Capture Token Filter which filter only tokes start with specific word. For example input tokens stream should be like ("abcefgh", "abc123" , "aabbcc", "abc", "abdef") and my tokenizer will return only tokes abcefgh , abc123, abc because those tokens start with "abc".

Can someone help me how to achieve this use-case?

Thanks.

The regex is easy - abc.*.

Wiktor Stribiżew
– Wiktor Stribiżew

2016-08-18 16:56:32 +00:00
Commented Aug 18, 2016 at 16:56 — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Aug 18, 2016 at 16:56

Andrei Stefan · Accepted Answer · 2016-08-18 23:16:50Z

1

I suggest something like this:

"analysis": {
  "analyzer": {
    "my_trim_keyword_analyzer": {
      "type": "custom",
      "tokenizer": "keyword",
      "filter": [
        "lowercase",
        "trim",
        "generate_tokens",
        "eliminate_tokens",
        "remove_empty"
      ]
    }
  },
  "filter": {
    "eliminate_tokens": {
      "pattern": "^(?!abc)\\w+$",
      "type": "pattern_replace",
      "replacement": ""
    },
    "generate_tokens": {
      "type": "pattern_capture",
      "preserve_original": 1,
      "patterns": [
        "(([a-z]+)(\\d*))"
      ]
    },
    "remove_empty": {
      "type": "stop",
      "stopwords": [""]
    }
  }
}

If your tokens are the result of a pattern_capture filter, you'd need to add after this filter the one called eliminate_tokens in my example which basically matches token that don't start with abc. Those that don't match are replaced by empty string ("replacement": "").

After this, to remove the empty tokens I added the remove_empty filter which is basically a stop filter where the stopword is "" (empty string).

answered Aug 18, 2016 at 23:16

Andrei Stefan

52.5k6 gold badges102 silver badges92 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user1827257 Over a year ago

Thanks for you answer. I tried this out and its work! May i have one more question? What if i want get words started with "abc" OR "bca" OR "gdfh" ?

Andrei Stefan Over a year ago

You change the regex for eliminate_tokens filter: ^(?!(abc|bca|gdfh))\\w+$

user1827257 Over a year ago

Thanks, so easy! Really helpfull!

Collectives™ on Stack Overflow

Elasticsearch pattern regex start with

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related