Reading analyzed fields as not-analyzed, and excluding parts of not-analyzed fields

LansK · September 5, 2015, 10:18pm

I have a field whose mapping defaulted to analyzed that I would like to search over (for terms) as if it were not-analyzed. Is there a way to do that without re-indexing? (The problem I am encountering is discussed here: https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html )

In a separate but related problem, I have a field that contains URLS that can end with a variety of variables, some of which I may not want to factor into a result set for a unique terms aggregation. An example would be something like “user/profile?=1234” or “user/profile/1234,” where I would want to count both of these as a hit for the term “user/profile” and only as a hit for that term.

There are times where I would want "profile?=1234" or "/1234" and times where I would not want them. Is there also a way to include or exclude certain things like this from a search? And if I cannot exclude or include things like this, what might be the best solution to this issue when adding my data to elasticsearch?

warkolm · September 5, 2015, 11:27pm

Once it's been analysed you cannot do anything other than reindex to un-analyse it. You should use a multi-field mapping to create the analysed and then an additional un-analysed (ie .raw) field.

You second one also sounds like it could be solved by using multifields.

LansK · September 6, 2015, 5:56pm

Thank you. I looked up multifields and can see the application to my first issue, but I was a bit unsure of their application to my second issue (counting "user/profile?=1234" or "user/profile/1234" as a hit for just "user/profile"). Could you elaborate on the second problem for me? If I now copy all urls to a non-indexed "raw" field, how might I perform a search that would equate the first two examples as a hit for just "user/profile" in an aggregation where "user/profile" would be a unique term and any extraneous text following "user/profile" (like ?=1234) would just count as a hit for the unique term "user/profile"?

warkolm · September 6, 2015, 11:54pm

Yeah perhaps that isn't the best use. Are you using Logstash? Because you could split that out into its own field there.

LansK · September 7, 2015, 6:49am

I am using logstash. Can you elaborate a bit more on what you are proposing? Are you saying to have 3 url fields and to have logstash take the incoming url data and copy it to a new field (after removing the extraneous text)? Sorry I am not following so well.

warkolm · September 7, 2015, 7:10am

I was thinking that you could break the user/profile?=1234 or user/profile/1234 bits out into their own field and then drop everything after the profile part.

Topic		Replies	Views
Possible approaches for indexing field as both analyzed and not_analyzed Elasticsearch	1	598	July 6, 2017
Analyzed vs not_analyzed Elasticsearch	2	2773	July 5, 2017
Field data or doc values on the not_analyzed "raw" value of an analyzed string? Elasticsearch	4	1000	January 2, 2017
Aggregate on analyzed field Elasticsearch	2	308	July 6, 2017
How I can do exact search by "not_analyzed" fields? Elasticsearch	4	2889	July 6, 2017

Reading analyzed fields as not-analyzed, and excluding parts of not-analyzed fields

Related topics