1

I would like to ask if exists some documentation which describe how to work with Elasticseach pattern regex.

I need to write Pattern Capture Token Filter which filter only tokes start with specific word. For example input tokens stream should be like ("abcefgh", "abc123" , "aabbcc", "abc", "abdef") and my tokenizer will return only tokes abcefgh , abc123, abc because those tokens start with "abc".

Can someone help me how to achieve this use-case?

Thanks.

1
  • The regex is easy - abc.*. Commented Aug 18, 2016 at 16:56

1 Answer 1

1

I suggest something like this:

"analysis": {
  "analyzer": {
    "my_trim_keyword_analyzer": {
      "type": "custom",
      "tokenizer": "keyword",
      "filter": [
        "lowercase",
        "trim",
        "generate_tokens",
        "eliminate_tokens",
        "remove_empty"
      ]
    }
  },
  "filter": {
    "eliminate_tokens": {
      "pattern": "^(?!abc)\\w+$",
      "type": "pattern_replace",
      "replacement": ""
    },
    "generate_tokens": {
      "type": "pattern_capture",
      "preserve_original": 1,
      "patterns": [
        "(([a-z]+)(\\d*))"
      ]
    },
    "remove_empty": {
      "type": "stop",
      "stopwords": [""]
    }
  }
}

If your tokens are the result of a pattern_capture filter, you'd need to add after this filter the one called eliminate_tokens in my example which basically matches token that don't start with abc. Those that don't match are replaced by empty string ("replacement": "").

After this, to remove the empty tokens I added the remove_empty filter which is basically a stop filter where the stopword is "" (empty string).

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for you answer. I tried this out and its work! May i have one more question? What if i want get words started with "abc" OR "bca" OR "gdfh" ?
You change the regex for eliminate_tokens filter: ^(?!(abc|bca|gdfh))\\w+$
Thanks, so easy! Really helpfull!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.