To understand the analysis process

terrasacer · July 30, 2015, 12:09pm

Hi all.

I'm trying to understand the analysis process. Especially during query time.

For example we have a field configured to be not analyzed. The value of this field is Albert Einstein.

If I search "Albert" does not match document with the match query but If I use the match_phrase_prefix query document is returned.

Why?

PatrickKik · August 3, 2015, 4:57am

What I understand from https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_match_phrase_prefix is that your query acts as a prefix.

You could test that by querying for "Einstein". Then both the match query and the match_phrase_prefix query would return nothing.

Could you post your findings please?

terrasacer · September 12, 2015, 8:21pm

Hi @PatrickKik,

First, sorry for late reply.

I'm trying to understand that: Is not_analyzed means that without transforming/converting the text to be stored?

If this is true, how match_phrase_prefix query returns results?

Does it analyzed text again during the query time?

terrasacer · September 26, 2015, 9:51am

If someone has an idea, how match_phrase_prefix query returns results in the above example?

softwaredoug · September 26, 2015, 11:58am

According to here, match phrase prefix does a prefix query on the last term in the query. With not_analyzed, the whole string is taken as a term. Therefore the "last term" is [Albert Einstein]. Albert is a prefix of this term, therefore it matches.

Most other search queries are exact term matches. A query term needs to match a document's term exactly after analysis is run to match.

This blog post might be a good primer for you.

terrasacer · September 26, 2015, 1:16pm

Hi @softwaredoug,

Thank you for this informative answer and blog post.

Therefore the "last term" is [Albert Einstein]. Albert is a prefix of this term, therefore it matches.

I understand that: The string [Albert Einstein] transformed into two sub-string ["Albert", "Einstein"] by the match_phrase_prefix query in search time.

Is that correct?

softwaredoug · September 26, 2015, 2:37pm

The search string is not_analyzed as well. The search engine takes the whole token [Albert Einstein] and treats it as a word. There's no extra "substring" involved. [Albert] is a prefix of the larger term [Albert Einstein].