I have a field that stores a URL. I want to be able to search the different parts of the URL (delimited by /). Currently, none of the built in analysers split the URL up as I want, so I'm thinking about engaging a pattern tokeniser and using that to analyse URLs.
https://www.domain.com/part1/user@site/part2/part3.jpg gets split into terms:
I want it to get split into:
so I'm thinking that a patter tokenizer splitting on
/ should do it.
However, we also let our users search the
_all field (correctly or incorrectly calling it "Everything" - you get the picture).
I know that the
_all field has its own analyser (in our case, I'm guessing it's the standard one as we haven't defined one). So, if we search the
url field with a given value (eg: user@site) and the
_all field with the same value, the results won't be the same because the analysers behave differently.
Is this correct? As I understand it, the
_all field is a space-delimited aggregation of all the field values which is then analysed, rather than a list of terms generated by analysing each field using its own analyser. So querying
user@site will actually search terms
site, and querying
url with the same would be querying
user@site (if we were to employ the pattern tokeniser, as before).
I just want to make sure that I haven't misunderstood anything and my surmise is correct.
Thanks for your help!