Bug? Search (specific) query doesn't return documents that does exist!


(Nir Reuveny) #1

Hi,

We have ES 1.7 cluster with daily logstash indexes.
We've noticed a major issue which we can't explain... when searching (or running term aggr) on a specific field we need it doesn't return any documents, although we do have many documents with that value or any other value in that index!
This doesn't happen on all 'daily' indexes... just on some...

See below examples that shows the problem:

http://kibana:9200/logstash-2015.08.29/_search
{
  "size": 200,
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "_type:\"record\" AND d_id_pre_1:\"c\"",
          "analyze_wildcard": true
        }
      }
    }
  },
  "fields": [
    "d_id_pre_1"
  ]
}

Results:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

Doing the search on the same index without filtering the value of the d_id_pre_1, you can see that there are documents with "c" as the value of the d_id_pre_1 field!

{
  "size": 5,
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "_type:\"record\"",
          "analyze_wildcard": true
        }
      }
    }
  },
  "fields": [
    "d_id_pre_1"
  ]
}

Result:

{
  "took": 52,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "hits": {
    "total": 4619845,
    "max_score": 1,
    "hits": [
      {
        "_index": "logstash-2015.08.29",
        "_type": "record",
        "_id": "b1ad227e8a8b3e3044c6822c55e1399b",
        "_score": 1,
        "fields": {
          "d_id_pre_1": [
            "2"
          ]
        }
      },
      {
        "_index": "logstash-2015.08.29",
        "_type": "record",
        "_id": "e0dcf0c4ea01b2d0b98a247955e1399b",
        "_score": 1,
        "fields": {
          "d_id_pre_1": [
            "2"
          ]
        }
      },
      {
        "_index": "logstash-2015.08.29",
        "_type": "record",
        "_id": "e371b878878935a4ec7d7c3355e1399c",
        "_score": 1,
        "fields": {
          "d_id_pre_1": [
            "c"
          ]
        }
      },
      {
        "_index": "logstash-2015.08.29",
        "_type": "record",
        "_id": "c8ac1a8e23c73175e0660fc655e1399c",
        "_score": 1,
        "fields": {
          "d_id_pre_1": [
            "1"
          ]
        }
      },
      {
        "_index": "logstash-2015.08.29",
        "_type": "record",
        "_id": "6479d993fe922d39dba61c9f55e1399c",
        "_score": 1,
        "fields": {
          "d_id_pre_1": [
            "7"
          ]
        }
      }
    ]
  }
}

This problem is happening on some of the daily indexes, not all of them... which makes this issue even more odd...
You can see this aggr query/results that shows this:

http://kibana:9200/logstash-2015.08.31,logstash-2015.08.30,logstash-2015.08.29,logstash-2015.08.28/_search
{
  "size": 0,
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "_type:\"record\" AND d_id_pre_1:\"c\"",
          "analyze_wildcard": true
        }
      }
    }
  },
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1d",
        "min_doc_count": 0
      }
    }
  }
}

Result:

{
  "took": 25,
  "timed_out": false,
  "_shards": {
    "total": 8,
    "successful": 8,
    "failed": 0
  },
  "hits": {
    "total": 269432,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "2": {
      "buckets": [
        {
          "key_as_string": "2015-08-28T00:00:00.000Z",
          "key": 1440720000000,
          "doc_count": 140945
        },
        {
          "key_as_string": "2015-08-29T00:00:00.000Z",
          "key": 1440806400000,
          "doc_count": 0
        },
        {
          "key_as_string": "2015-08-30T00:00:00.000Z",
          "key": 1440892800000,
          "doc_count": 0
        },
        {
          "key_as_string": "2015-08-31T00:00:00.000Z",
          "key": 1440979200000,
          "doc_count": 128487
        }
      ]
    }
  }
}

Ideas?? seems like a huge bug at this point... as I can't find any good reason for this behavior...

Thanks!

Nir.


(Nir Reuveny) #2

Anyone can help or have any ideas on this problem?


(Michael McCandless) #3

How is the d_id_pre_1 field indexed in the problematic daily index? Is it analyzed (which analyzer)?

Can you try removing the double-quotes around the query? This tells the query parser to make a phrase query, but (at least in this example) you have only one token (c) that you are trying to match ...


(Nir Reuveny) #4

Hi Mike,

The field is set to 'not_analyzed' in all daily indexes... (coming from the same template...)
So AFAIK you need to search for the full text which I did...

"d_id_pre_1" : {
"index" : "not_analyzed",
"type" : "string"
}


(Michael McCandless) #5

OK good, yes you must search for the full text.

Did you try the query without double quotes around c?


(Nir Reuveny) #6

Just did. it's the same result... again, the mappings are exactly the same on all indexes. but the search just doesn't 'work' on some of the indexes...
Seems like a bug, no?

{
"size": 200,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "_type:"record" AND d_id_pre_1:c",
"analyze_wildcard": true
}
}
}
}
}

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}


(Michael McCandless) #7

Yeah maybe a bug ... can you simplify it down to a small case?

E.g. remove the type:"record" part, remove the analyze_wildcard, use a straight query (not filtered)?


(Nir Reuveny) #8

Mike,

I've tried that. if I remove the 'type' filter, those 'problematic' indexes does return results but only for other document types and not to the main type we use in 90% of our logs ('record')
I've also tried to run term aggr... same behavior! it shows really small numbers on the problematic indexes as it doesn't find (or ignore in some way) most of the documents (the ones with 'record' type)...
But on the 'good' indexes it shows the very high numbers for each bucket in the aggr...

{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*"
}
}
}
},
"aggs": {
"3": {
"terms": {
"field": "d_id_pre_1",
"size": 20
}
}
}
}

    {
      "key": "0",
      "doc_count": 30
    },
    {
      "key": "1",
      "doc_count": 30
    },
    {
      "key": "6",
      "doc_count": 28
    },
    {
      "key": "c",
      "doc_count": 26
    },
    {
      "key": "f",
      "doc_count": 24
    }

(Nir Reuveny) #9

Bumping this problem... anyone?


(system) #10