Filtering by URL tokens


(Danny Kidron) #1

I would like to be able to use elasticsearch in order to filter entities based on a URL field.
The input is in terms of tokens, such as 'http://www.cnn.com/' or 'cnn.com/politics' and I should be able to filter out entities whose URL contains these tokens.
I can assume that these tokens will always contain the domain name (though possibly without 'www') and therefore I can use the prefix filter.
How does the prefix filter perform? Do I need to mark the field as unanalyzed when indexing it?
What other options do I have?


(system) #2