How to configure a pattern_capture on a specific token type?


(St├ęphanie Rochelemagne) #1

Hi,

I have setup a custom analyzer with uax_url_email as tokenizer. I have
added a pattern_capture filter to index the emails.
Now I would like that the urls be searchable and I would like to do the
same but executing the filter with a condition: either if the token starts
with http or on the specific type of the token URL.
I don;t know if it is possible and if say, please let me know how to
achieve it.

Thanks in advance!
Stephanie

    analysis: {
      filter: {
        email: {
           type: "pattern_capture",
           preserve_original: 1,
           patterns: [
              "([\\w\\-\\+\\.]+?)(?:[\\.\\-_](\\w+))?@([\\w\\-]+)\."
           ]
        },
        url: {
           type: "pattern_capture",
           token_type: "URL",
           preserve_original: 1,
           patterns: [
              "(?:(\w+)(?:.|\/))"
           ]
        }
      },
      analyzer: {
        default: {
          type: 'custom',
          char_filter: [
            'html_strip'
          ],
          tokenizer: 'uax_url_email',
          filter: [
            'email',
            'url',
            'asciifolding',
            'lowercase'
          ]
        }
      }
    }

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5587b5a1-3cd4-4b9d-a6af-02f7daa237d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2