Avoiding analysis of query strings


(Magnus Bäck) #1

I’m using a path hierarchy analyzer to analyze fields containing Java class names:

"analysis": {
  "analyzer": {
    "java_classname_analyzer": {
      "tokenizer": "java_classname_tokenizer",
      "type": "custom"
    }
  },
  "tokenizer": {
    "java_classname_tokenizer": {
      "type": "PathHierarchy",
      "delimiter": ".",
      "reverse": false
    }
  }
}

This results in a correct tokenization of input strings, i.e. java.io.File gets tokenized into (java, java.io, java.io.File) and I expected to be able to search for java.io and get back e.g. java.io.File, java.io.Reader, and java.io.Writer. However, I’ve realized that when including a java_classname_analyzer field in a query string, e.g. using the query

class:java.io

in Kibana I get many more hits than I asked for since the search term itself is tokenized into (java, java.io) and I’m actually getting hits for everything matching java.*.

Is there a way to I avoid this? With the query DSL I guess I can use a term query rather than e.g. a match query but since the use case is Kibana it needs to be a query string.


(Adrien Grand) #2

You could configure your string field to have a keyword analyzer as a search_analyzer.


(Magnus Bäck) #3

Thanks @jpountz, that looks exactly like what I'm looking for (could've sworn I looked at that documentation page the other day). However, I don't get the result I'm looking for. I updated my index template so that the logstash-2015.05.27 index uses the keyword analyzer for a number of fields and verified that the actual mapping of the index looks okay:

$ curl --silent hostname:9200/logstash-2015.05.27/_mapping/dotnet | \
    jq '."logstash-2015.05.27".mappings.dotnet.properties.class'
{
  "type": "string",
  "analyzer": "java_classname_analyzer",
  "fields": {
    "raw": {
      "type": "string",
      "index": "not_analyzed",
      "ignore_above": 256
    }
  },
  "search_quote_analyzer": "keyword"
}

Well, I expected "search_analyzer" rather than "search_quote_analyzer" but I suppose that's okay.

However, a Kibana query for type:dotnet AND class:TestApp.MainForm.Foo still returns the following:

{
  "_index": "logstash-2015.05.27",
  "_type": "dotnet",
  "_id": "AU2Uf1duWK9xLBfC4G3m",
  "_score": null,
  "_source": {
    "@timestamp": "2015-05-27T10:31:22.126+02:00",
    "message": "another test message",
    "type": "dotnet",
    "class": "TestApp.MainForm",
    "@version": "1",
  },
  "sort": [
    1432715482126,
    1432715482126
  ]
}

Is this by any chance because ES doesn't analyze the query for each index being searched but in this case uses the analyzer specified in the mappings of the dozens of other indexes that don't use the keyword analyzer for that field?


(Adrien Grand) #4

Hmm, this looks like a bug! How does your mapping template look like, did you actually modify the search_analyzer and not the search_quote_analyzer?

Regarding your other question, Elasticsearch actually analyzes the query string per shard, so your change to new indices should work on these new indices.


(Magnus Bäck) #5

Hmm, this looks like a bug! How does your mapping template look like, did you actually modify the search_analyzer and not the search_quote_analyzer?

Yes, this is what I ended up with:

    "class": {
      "type": "string",
      "analyzer": "java_classname_analyzer",
      "search_analyzer": "keyword",
      "fields": {
        "raw": {
          "type": "string",
          "index": "not_analyzed",
          "ignore_above": 256
        }
      }
    },

(Adrien Grand) #6

What version of Elasticsearch are you running?


(Magnus Bäck) #7

We're running ES 1.5.2.


(Peter L) #8

We have the same problem with ES 1.7.0. We specify search_analyzer in the mapping template but the actual mapping has search_quote_analyzer and it does not work as expected.


(Adrianocrestani) #9

I just saw the same problem in 2.1. In the processing of migrating from 1.5.2 to 2.1, I used the same mapping and in 2.1 my search_analyzer gets applied as search_quote_analyzer.


(Adrianocrestani) #10

Actually, I think it is a 1.5.2 bug, probably fixed in a later version.

So, in the process of I changed all my mappings to no longer use "index_analyzer" and just use "analyzer". So, all my fields would have "search_analyzer" and "analyzer". However, when I do that on 1.5.2, my index mapping ends up like this:

"search_quote_analyzer": "ngram_search_analyzer",
"analyzer": "ngram_index_analyzer",

As you can see, there is not search_analzyer and when you run a search against that field, it seems to default the "search_analyzer" to "analyzer".


(system) #11