Exact match in a not_analyzed field

Imran_Azad · December 14, 2015, 12:23pm

How would I do an exact match on a not_analyzed field that contains a large amount of text? For example take the following paragraph:

Kefir grains are a combination of lactic acid bacteria and yeasts in a matrix of proteins, lipids, and sugars, and this symbiotic matrix, (or SCOBY) forms "grains" that resemble cauliflower. For this reason, a complex and highly variable community of lactic acid bacteria and yeasts can be found in these grains although some predominate; Lactobacillus species are always present.[3] Even successive batches of kefir may differ due to factors such as the kefir grains rising out of the milk while fermenting, or curds forming around the grains, as well as room temperature.[8]

How can I get a match on a exact phrase search for "lactic acid bacteria and yeasts can be found in these grains" using the query_string only?

polyfractal · December 14, 2015, 5:41pm

So, it is technically possible...but I want to discourage you from doing this. It will lead to poor performance in the long run. It's better to restructure your data now and leverage properly tokenized fields.

To make your query work, you need to query for the exact phrase (wrapped in quotes) with wildcards on either side:

GET /test/_search
{
    "query": {
        "query_string": {
           "default_field": "foo",
           "query": "*\"lactic acid bacteria and yeasts can be found in these grains\"*"
        }
    }
}

The reason this is terrible for performance is because a not_analyzed string is stored as a single token inside the index. To find the phrase, Lucene needs to look through every field then do a linear scan across the characters in that field to see if there is a match. This is very slow because it does not leverage the index at all.

In contrast, if this field was an analyzed field, it would be tokenized, and the individual tokens would be stored in the index. A phrase search can then find all documents with the required tokens via the index, then execute a second phase to see if those documents have the terms in the correct ordering. This is much faster.

Sooo...I'd ask why this field is a not_analyzed field, and if you have the ability to analyze it instead? It would be much better to do this operation with a match_phrase for example.

Topic		Replies	Views
Exact Phrase Match on a not_analyzed field with a space in the phrase Elasticsearch	3	1366	July 6, 2017
How I can do exact search by "not_analyzed" fields? Elasticsearch	4	2938	July 6, 2017
Inaccurate documentation for not_analyzed and full text queries Elasticsearch	4	562	April 17, 2018
Boolean matchQuery on a not_analyzed field? Elasticsearch	6	625	July 6, 2017
Exact match not working Elasticsearch	7	2863	July 5, 2017

Exact match in a not_analyzed field

Related topics