Query against _all matches for a field not contained in _all


(Hrachya Yeghishyan) #1

I have used the following index template for my indices:

{
   "template": "crashlytics*",
   "settings": {
    "number_of_shards": "5",
    "number_of_replicas": "1",
    "analysis": {
      "filter": {
            "nGram_filter": {
               "type": "nGram",
               "min_gram": 2,
               "max_gram": 20,
               "token_chars": [
                  "letter",
                  "digit",
                  "punctuation",
                  "symbol"
               ]
            }
         },
      "analyzer": {
        "nGram_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding",
                  "nGram_filter"
               ]
            },
            "whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
        }
    }
  },
    "mappings" : {
      "crashlytics" : {
        "_all" : {
          "analyzer": "nGram_analyzer",
          "search_analyzer": "whitespace_analyzer"
        },
        "properties" : {
          "@timestamp" : {
            "type" : "date",
            "include_in_all": false
          },
          "@version" : {
            "type" : "keyword",
            "include_in_all": false
          },
          "exception": {
            "type": "keyword",
            "include_in_all": false
          },
          "is_handled": {
            "type": "boolean",
            "include_in_all": false
          },
          "jailbroken": {
            "type": "boolean",
            "include_in_all": false
          },
          "language_code": {
            "type": "keyword",
            "include_in_all": false
          },
          "message": {
            "type": "keyword",
            "include_in_all": false
          },
          "phone_manufacturer": {
            "type": "keyword",
            "include_in_all": false
          },
          "phone_model": {
            "type": "keyword",
            "include_in_all": false
          },
          "platform": {
            "type": "keyword",
            "include_in_all": false
          },
          "stacktrace": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "include_in_all": false
              }
            }
          },
          "crash_case": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "include_in_all": false
              }
            }
          },
          "total_disk_space": {
            "type": "long",
            "include_in_all": false
          },
          "total_memory": {
            "type": "long",
            "include_in_all": false
          },
          "user_id": {
            "type": "keyword",
            "include_in_all": false
          },
          "uuid": {
            "type": "keyword",
            "include_in_all": false
          },
          "last_events": {
            "properties": {
              "event_type": {"type": "text"},
              "timestamp": {"type": "long"}
            },
            "include_in_all": false
          },
          "screenshot": {
            "type": "keyword",
            "include_in_all": false
          },
          "since_startup": {
            "type": "long",
            "include_in_all": false
          }
        }
      }
    }

Please, note that none of the fields is included in _all, except for the fields "stack_trace" (only with type "text") and "crash_case" (only with type "text"). The mapping is not full, but neither of the omitted fields is included in "_all".

So, I am performing the following query on my index:

"query": {
  "bool" : {
    "must" : [
      {
        "range" : {
          "@timestamp" : {
            "from" : 1516702800000,
            "to" : 1516704000000,
            "include_lower" : true,
            "include_upper" : true,
            "format" : "epoch_millis",
            "boost" : 1.0
          }
        }
      }
    ],
    "filter" : [
      {
        "match" : {
          "_all" : {
            "query" : "shop mi",
            "operator" : "and",
            "boost" : 1.0
          }
        }
      }
    ],
    "boost" : 1.0
  }
 }

This query should retrieve only those documents, which contain both "shop" and "mi" in either of the fields "stack_trace" and "crash_case".

And one of the matched documents is the following:

{         
          "message" : "An error occurred while executing doInBackground()",
          "@version" : "1",
          "@timestamp" : "2018-01-23T10:27:17.479Z",
          "type" : "crashlytics",
          "exception" : "java.lang.RuntimeException",
          "phone_manufacturer" : "Xiaomi",
          "phone_model" : "Mi A1",
          "platform" : "android",
          "proc_info" : 8,
          "recommended_mgpx" : 5,
          "sd_card_available" : true,
          "stacktrace" : "java.lang.RuntimeException: An error occurred while executing doInBackground()\n\tat Caused by: java.lang.IllegalArgumentException: cannot parse package /storage/emulated/0/.downloads/.shop/package_splash_of_color\n\tat \n\t... 3 more\n",
          "crash_case" : "at com.studio.util.ae.a(ProGuard:43)"
        }

Here the field "stack_trace" contains "shop" (as a directory) in its value. But neither "stack_trace" nor "crash_case" contain "mi" in their values. Instead, you can notice that the field "phone_manufacturer" (the value of which is "Xiaomi") contains "mi". But this document didn't have to match the query above, as the field "phone_manufacturer" is not included in "_all".
What might be the reason of this type of result? Is there any problem with the template?


(Alan Woodward) #2

Hi @Hrachya_Yeghishyan

I can't see anything obviously wrong with the mappings. Does adding { "explain" : true } to the query yield up anything useful?

One way to avoid having to exclude most of the fields would be to have an opt-in copy_to field for stacktrace and crash_case, and disabling _all (_all is going away in 6.x anyway)


(Hrachya Yeghishyan) #3

Hi @AlanWoodward
Thanks for the note about opt-in copy_to field.
Currently I'm using ES 5.x. I have thoroughly looked through the mappings and the indices. It turns out that the _all field was disabled in the latest indices after applying this template. While applying the template, I didn't specify in the mapping part

  "_all" : {
       "enabled" : "true"
     }

Instead, I have just specified the index and search analyzers, and the "enabled" option was, most probably, set to "false" by default.
Now I have changed the template, and set the "enabled" option to "true". I guess this will work for the upcoming indices.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.