Field Analyser vs _all Analyser and Query String Analyser

I have a field that stores a URL. I want to be able to search the different parts of the URL (delimited by /). Currently, none of the built in analysers split the URL up as I want, so I'm thinking about engaging a pattern tokeniser and using that to analyse URLs.

eg: https://www.domain.com/part1/user@site/part2/part3.jpg gets split into terms:

  • https
  • www.domain.com
  • part1
  • user
  • site
  • part2
  • part3

I want it to get split into:

  • https
  • www.domain.com
  • part1
  • user@site
  • part2
  • part3.jpg

so I'm thinking that a patter tokenizer splitting on / should do it.

However, we also let our users search the _all field (correctly or incorrectly calling it "Everything" - you get the picture).

I know that the _all field has its own analyser (in our case, I'm guessing it's the standard one as we haven't defined one). So, if we search the url field with a given value (eg: user@site) and the _all field with the same value, the results won't be the same because the analysers behave differently.

Is this correct? As I understand it, the _all field is a space-delimited aggregation of all the field values which is then analysed, rather than a list of terms generated by analysing each field using its own analyser. So querying _all with user@site will actually search terms user and site, and querying url with the same would be querying user@site (if we were to employ the pattern tokeniser, as before).

I just want to make sure that I haven't misunderstood anything and my surmise is correct.

Thanks for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.