Reading analyzed fields as not-analyzed, and excluding parts of not-analyzed fields


#1

I have a field whose mapping defaulted to analyzed that I would like to search over (for terms) as if it were not-analyzed. Is there a way to do that without re-indexing? (The problem I am encountering is discussed here: https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html )

In a separate but related problem, I have a field that contains URLS that can end with a variety of variables, some of which I may not want to factor into a result set for a unique terms aggregation. An example would be something like “user/profile?=1234” or “user/profile/1234,” where I would want to count both of these as a hit for the term “user/profile” and only as a hit for that term.

There are times where I would want "profile?=1234" or "/1234" and times where I would not want them. Is there also a way to include or exclude certain things like this from a search? And if I cannot exclude or include things like this, what might be the best solution to this issue when adding my data to elasticsearch?


(Mark Walkom) #2

Once it's been analysed you cannot do anything other than reindex to un-analyse it. You should use a multi-field mapping to create the analysed and then an additional un-analysed (ie .raw) field.

You second one also sounds like it could be solved by using multifields.


#3

Thank you. I looked up multifields and can see the application to my first issue, but I was a bit unsure of their application to my second issue (counting "user/profile?=1234" or "user/profile/1234" as a hit for just "user/profile"). Could you elaborate on the second problem for me? If I now copy all urls to a non-indexed "raw" field, how might I perform a search that would equate the first two examples as a hit for just "user/profile" in an aggregation where "user/profile" would be a unique term and any extraneous text following "user/profile" (like ?=1234) would just count as a hit for the unique term "user/profile"?


(Mark Walkom) #4

Yeah perhaps that isn't the best use. Are you using Logstash? Because you could split that out into its own field there.


#5

I am using logstash. Can you elaborate a bit more on what you are proposing? Are you saying to have 3 url fields and to have logstash take the incoming url data and copy it to a new field (after removing the extraneous text)? Sorry I am not following so well.


(Mark Walkom) #6

I was thinking that you could break the user/profile?=1234 or user/profile/1234 bits out into their own field and then drop everything after the profile part.


(system) #7