I have a field that stores a URL. I want to be able to search the different parts of the URL (delimited by /). Currently, none of the built in analysers split the URL up as I want, so I'm thinking about engaging a pattern tokeniser and using that to analyse URLs.
eg: https://www.domain.com/part1/user@site/part2/part3.jpg
gets split into terms:
https
www.domain.com
part1
user
site
part2
part3
I want it to get split into:
https
www.domain.com
part1
user@site
part2
part3.jpg
so I'm thinking that a patter tokenizer splitting on /
should do it.
However, we also let our users search the _all
field (correctly or incorrectly calling it "Everything" - you get the picture).
I know that the _all
field has its own analyser (in our case, I'm guessing it's the standard one as we haven't defined one). So, if we search the url
field with a given value (eg: user@site) and the _all
field with the same value, the results won't be the same because the analysers behave differently.
Is this correct? As I understand it, the _all
field is a space-delimited aggregation of all the field values which is then analysed, rather than a list of terms generated by analysing each field using its own analyser. So querying _all
with user@site
will actually search terms user
and site
, and querying url
with the same would be querying user@site
(if we were to employ the pattern tokeniser, as before).
I just want to make sure that I haven't misunderstood anything and my surmise is correct.
Thanks for your help!