I have a field that stores a URL. I want to be able to search the different parts of the URL (delimited by /). Currently, none of the built in analysers split the URL up as I want, so I'm thinking about engaging a pattern tokeniser and using that to analyse URLs.
eg: https://www.domain.com/part1/user@site/part2/part3.jpg gets split into terms:
httpswww.domain.compart1usersitepart2part3
I want it to get split into:
httpswww.domain.compart1user@sitepart2part3.jpg
so I'm thinking that a patter tokenizer splitting on / should do it.
However, we also let our users search the _all field (correctly or incorrectly calling it "Everything" - you get the picture).
I know that the _all field has its own analyser (in our case, I'm guessing it's the standard one as we haven't defined one). So, if we search the url field with a given value (eg: user@site) and the _all field with the same value, the results won't be the same because the analysers behave differently.
Is this correct? As I understand it, the _all field is a space-delimited aggregation of all the field values which is then analysed, rather than a list of terms generated by analysing each field using its own analyser. So querying _all with user@site will actually search terms user and site, and querying url with the same would be querying user@site (if we were to employ the pattern tokeniser, as before).
I just want to make sure that I haven't misunderstood anything and my surmise is correct.
Thanks for your help!