[RESOLVED] Basic search to return all _types is returning incomplete data?


(Chris Neal) #1

Hello all,

I've just noticed that ES search queries are not returning complete data anymore. It could be related to my upgrade from 1.5.2 to 1.6.0, but maybe not. I'm not sure what has changed that would cause this.

I have an index with 3 _types that I validate by querying the mapping API:

{
  "myindex-20150706" : {
    "mappings" : {
      "hotel_avail_dwarf4bookingcomplugin" : { mapping snip.... },
      "http_access_dwarf4bookingcomplugin" : { mapping snip.... },
      "perf_dwarf4bookingcomplugin" : { mapping snip..... }
    }
  }
}

When I send this basic query to ES:

curl -XGET 'http://myhost:9200/myindex/_search?pretty ' -d '{
  "facets": {
    "terms": {
      "terms": {
        "field": "_type",
        "size": 100,
        "order": "count",
        "exclude": []
      },
      "facet_filter": {
        "fquery": {
          "query": {
            "filtered": {
              "query": {
                "bool": {
                  "should": [
                    {
                      "query_string": {
                        "query": "*"
                      }
                    }
                  ]
                }
              },
              "filter": {
                "bool": {
                  "must": [
                    {
                      "range": {
                        "@timestamp": {
                  "from": 1436192919014,
                  "to": 1436196519014
                        }
                      }
                    }
                  ]
                }
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}'

I only get one type returned:

{
  "took" : 5379,
  "timed_out" : false,
  "_shards" : {
    "total" : 6,
    "successful" : 6,
    "failed" : 0
  },
  "hits" : {
    "total" : 361249580,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "facets" : {
    "terms" : {
      "_type" : "terms",
      "missing" : 0,
      "total" : 29362,
      "other" : 0,
      "terms" : [ {
        "term" : "perf_dwarf4bookingcomplugin",
        "count" : 29362
      } ]
    }
  }
}

When I query the index status API, it tells me that I have 369183728 documents in the index, so I know there is more data than what is coming back from my query, presumably in the _types that are not being returned.

curl -XGET 'http://myhost:9200/myindex-20150706/_status?pretty' | less
{
  "_shards" : {
    "total" : 12,
    "successful" : 12,
    "failed" : 0
  },
  "indices" : {
    "myindex-20150706" : {
      "index" : {
        "primary_size_in_bytes" : 51574050317,
        "size_in_bytes" : 103209014450
      },
      "translog" : {
        "operations" : 3068301
      },
      "docs" : {
        "num_docs" : 369183728,
        "max_doc" : 369183728,
        "deleted_docs" : 0
      }, (snip....)

This is very strange, and I'm wondering if it might be a bug? Or maybe something funky in the mapping? I honestly don't think any mapping has changed though, and this used to work as expected.

Is there a reason this should not work?
Thanks again for the time.
Chris


(Chris Neal) #2

Well, I've also confirmed that this is happening in my Development cluster as well. I'm considering rolling back to 1.5.x to see if that fixes it, but still curious if anyone might have a suggestion as to what might be going on.

Another oddity is that the searches seem to always return all the types that begin with the name "perf", and not any of the other ones. How weird is that? Something else I will investigate now. :smile:

Again, thanks for any suggestions!
Chris


(Chris Neal) #3

Ok, a bit more information that hopefully will be useful.

I have a particular index that has 18 types total. 6 begin with perf, 12 begin with hotel. When I execute a search with a query parameter as below, I get all 12 types returned that begin with hotel, but no perf types.

"query_string": {
  "query": "event_detail:*"
 }

Then, changing the search to have just a wildcard:

"query_string": {
  "query": "*"
 }

Now I only get the types that begin with perf, and no types that begin with hotel. How strange is that?

I do have one default mapping that applies to all indexes, The "all" field is disabled, I have updated the index.query.default_field property to use the "message" field, and my mappings for most fields look like this:

"event": { "type": "string", "index" : "not_analyzed", "doc_values": true, "norms": { "enabled": false } },

Again, all of this seems quite standard. Could there possibly be a bug in the _search API in 1.6.0? I really would rather not rollback to 1.5.x, but that might be the next thing I try.

Thanks for reading.
Chris


(Chris Neal) #4

Well, after 2 days of looking and testing, I solved this one.

Turns out that the root cause was the updating of the index.query.default_field property to use the "message" field.

As fate would have it, only our "perf" logs have that field. All others do not. So, when you search with a wildcard only, you get those and not the others, which do not have the message field, and when you give it a search criteria, then you get the others.

I've validated under 1.5.2 that going back to the default index.query.default_field gives the expected results. Now re-updating to 1.6.0 to do the same thing.

Sheesh.


(system) #5