Avoiding analysis of query strings

magnusbaeck · May 25, 2015, 1:02pm

I’m using a path hierarchy analyzer to analyze fields containing Java class names:

"analysis": {
  "analyzer": {
    "java_classname_analyzer": {
      "tokenizer": "java_classname_tokenizer",
      "type": "custom"
    }
  },
  "tokenizer": {
    "java_classname_tokenizer": {
      "type": "PathHierarchy",
      "delimiter": ".",
      "reverse": false
    }
  }
}

This results in a correct tokenization of input strings, i.e. java.io.File gets tokenized into (java, java.io, java.io.File) and I expected to be able to search for java.io and get back e.g. java.io.File, java.io.Reader, and java.io.Writer. However, I’ve realized that when including a java_classname_analyzer field in a query string, e.g. using the query

class:java.io

in Kibana I get many more hits than I asked for since the search term itself is tokenized into (java, java.io) and I’m actually getting hits for everything matching java.*.

Is there a way to I avoid this? With the query DSL I guess I can use a term query rather than e.g. a match query but since the use case is Kibana it needs to be a query string.

jpountz · May 25, 2015, 2:10pm

You could configure your string field to have a keyword analyzer as a search_analyzer.

magnusbaeck · May 27, 2015, 8:56am

Thanks @jpountz, that looks exactly like what I'm looking for (could've sworn I looked at that documentation page the other day). However, I don't get the result I'm looking for. I updated my index template so that the logstash-2015.05.27 index uses the keyword analyzer for a number of fields and verified that the actual mapping of the index looks okay:

$ curl --silent hostname:9200/logstash-2015.05.27/_mapping/dotnet | \
    jq '."logstash-2015.05.27".mappings.dotnet.properties.class'
{
  "type": "string",
  "analyzer": "java_classname_analyzer",
  "fields": {
    "raw": {
      "type": "string",
      "index": "not_analyzed",
      "ignore_above": 256
    }
  },
  "search_quote_analyzer": "keyword"
}

Well, I expected "search_analyzer" rather than "search_quote_analyzer" but I suppose that's okay.

However, a Kibana query for type:dotnet AND class:TestApp.MainForm.Foo still returns the following:

{
  "_index": "logstash-2015.05.27",
  "_type": "dotnet",
  "_id": "AU2Uf1duWK9xLBfC4G3m",
  "_score": null,
  "_source": {
    "@timestamp": "2015-05-27T10:31:22.126+02:00",
    "message": "another test message",
    "type": "dotnet",
    "class": "TestApp.MainForm",
    "@version": "1",
  },
  "sort": [
    1432715482126,
    1432715482126
  ]
}

Is this by any chance because ES doesn't analyze the query for each index being searched but in this case uses the analyzer specified in the mappings of the dozens of other indexes that don't use the keyword analyzer for that field?

jpountz · May 27, 2015, 11:41am

Hmm, this looks like a bug! How does your mapping template look like, did you actually modify the search_analyzer and not the search_quote_analyzer?

Regarding your other question, Elasticsearch actually analyzes the query string per shard, so your change to new indices should work on these new indices.

magnusbaeck · May 27, 2015, 2:53pm

Hmm, this looks like a bug! How does your mapping template look like, did you actually modify the search_analyzer and not the search_quote_analyzer?

Yes, this is what I ended up with:

    "class": {
      "type": "string",
      "analyzer": "java_classname_analyzer",
      "search_analyzer": "keyword",
      "fields": {
        "raw": {
          "type": "string",
          "index": "not_analyzed",
          "ignore_above": 256
        }
      }
    },

jpountz · May 27, 2015, 3:09pm

What version of Elasticsearch are you running?

magnusbaeck · May 27, 2015, 7:58pm

We're running ES 1.5.2.

plebedev · November 30, 2015, 10:38pm

We have the same problem with ES 1.7.0. We specify search_analyzer in the mapping template but the actual mapping has search_quote_analyzer and it does not work as expected.

adrianocrestani · December 16, 2015, 4:46pm

I just saw the same problem in 2.1. In the processing of migrating from 1.5.2 to 2.1, I used the same mapping and in 2.1 my search_analyzer gets applied as search_quote_analyzer.

adrianocrestani · December 16, 2015, 8:14pm

Actually, I think it is a 1.5.2 bug, probably fixed in a later version.

So, in the process of I changed all my mappings to no longer use "index_analyzer" and just use "analyzer". So, all my fields would have "search_analyzer" and "analyzer". However, when I do that on 1.5.2, my index mapping ends up like this:

"search_quote_analyzer": "ngram_search_analyzer",
"analyzer": "ngram_index_analyzer",

As you can see, there is not search_analzyer and when you run a search against that field, it seems to default the "search_analyzer" to "analyzer".

Topic		Replies	Views
Query analyzer confusion of multi-field containing keyword and analyzed Elasticsearch	2	448	October 22, 2018
Why match phrase searching is needed when you want to see the results of an analyzer in kibana? Kibana	2	511	December 7, 2017
Analyzer to support slash "/" in Query String search Elastic Search	5	44	September 16, 2024
Field analyzer ignored on query string regex - 7.10.2 Elasticsearch	1	447	May 25, 2021
[SOLVED] A little question regarding Kibana search Kibana	4	838	July 6, 2017

Avoiding analysis of query strings

Related topics