Getting false hits on queries (boolean, match, term, and filters)

I use Elastic to host an index used for managing list filters, which returns IDs matching a query, that I then use in Seek call on a SQL database to retrieve large sets of data with decent efficiency. Lately I've seen a really strange issue in Elastic where my queries are returning documents that do not match the query even though I'm not using any fuzzy settings or text analyzers.

We use the PHP client so my code below is formatted in that manner. I'm trying to retrieve only document IDs where the field "status = 0", but very often I get back documents where the status is some other arbitrary integer. The point is that the query should be returning an exact match. I've tried several different approaches including a "Term" query inside a Boolean, a match query outside a Boolean, and a filter inside a Boolean. All for "status=0" and still I get back the exact same documents that don't match, mixed in with documents that meet the query params.

$params = [
    'size' => 100,
    'index' => $index,
    'body' => [
        'query' => [
            'bool' => [
                'filter' => [
                    'term' => [
                        'status' => 0
                    ]
                ],
                'must' => [
                    [
                        'terms' => [
                            'job_number' => $job_numbers
                        ],
                    ],
                    [
                        'range' => [
                            $range_column => [
                                'gte' => $start_date,
                                'lte' => $end_date
                            ]
                        ]
                    ]
                ],
                'must_not' => [
                    [
                        'exists' => [
                            'field' => 'deleted'
                        ]
                    ]
                ]
            ],
        ],
        'sort' => [
            $sort_column => [
                'order' => $direction,
                'mode' => 'avg',
            ],
        ],
    ]
];

Any ideas on why this could be happening? This is occurring on even a really basic setup with 1 node, 1 shard, and pretty much default configs.
The "status" field is mapped to be a "number" so matching should be exact, but it's obviously not.

I've tried deleting and rebuilding the index from scratch, and I've tried delete_by_query to drop all data in the index and rebuild. In Kibana, the docs do not return if I use KQL to find "status=0", but for some reason through the Elastic API they continue to return as false matches.

Additionally, to add to the weirdness, I get back 88 documents even though the batch size is 100 and a count aggregation shows that there are over 1500 matching docs.

Any help, thoughts, tips, questions to help me help you help me? This has been a thorn in my side for about a week now.

can you share a minimal example using a reproduction (including mapping, index creation) with kibana dev tools/console, that would make it easier to look it.

Just looking at the query without even showing results is not helpful here.

Two minor things. The parts from the must part can also be moved into the filter part, so that they are not scored and can potentially get filtered. Can you also explain the sorting out of curiosity?

Hi @spinscale! Thanks very much for the reply here. Strangely, as I was working on getting this data together for you, suddenly the error isn't happening in the environment anymore. It's still occurring in another testing environment we run, but it's vanished from the initial place I noticed the error. I'll showcase it for you in the testing environment where I can still recreate it, but perhaps the magic vanishing act helps narrow down the cause? Does Elastic cache data somewhere even if you are running 1 node? I did notice that someone configured this index to have 2 replicas, so did they somehow get unsync'd? I don't even know if you can answer these kind of questions, just explicating.

Okay so my mapping for the index with issues is:

{
  "list-billing" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "billing_due_date" : {
          "type" : "date"
        },
        "customer_id" : {
          "type" : "integer"
        },
        "customer_name" : {
          "type" : "text"
        },
        "export_date" : {
          "type" : "date"
        },
        "exported_by" : {
          "type" : "integer"
        },
        "id" : {
          "type" : "integer"
        },
        "job_description" : {
          "type" : "text"
        },
        "job_id" : {
          "type" : "integer"
        },
        "job_number" : {
          "type" : "text"
        },
        "project_number" : {
          "type" : "text"
        },
        "project_title" : {
          "type" : "text"
        },
        "query_table" : {
          "type" : "text"
        },
        "quote_number" : {
          "type" : "text"
        },
        "service_name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "status" : {
          "type" : "integer"
        },
        "workticket_id" : {
          "type" : "long"
        },
        "workticket_start_date" : {
          "type" : "date"
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1627413134792",
        "number_of_shards" : "1",
        "number_of_replicas" : "2",
        "uuid" : "i removed it",
        "version" : {
          "created" : "7080199"
        },
        "provided_name" : "list-billing"
      }
    }
  }
}

and the Kibana query version of the above query is here:

GET /list-billing/_search?size=100
{
  "sort": [
    {
      "workticket_start_date": {
        "order": "desc",
        "mode": "avg"
      }
    }
  ],
  "query": {
    "bool": {
      "must": [
        {"terms": {
          "job_number": [
            "71008"
          ]
        }},
        {"range": {
          "workticket_start_date": {
            "gte": "2021-01-01",
            "lte": "2021-12-31"
          }
        }}
      ],
      "must_not": [
        {"exists": {
          "field": "deleted"
        }}
      ],
      "filter": [
        {"term": {
          "status": 0
        }}
      ]
    }
  }
}

which in the Kibana console returns great data, no mismatches and everything looks great. Here's a small sampling of the response, no false matches in the 13 docs returned.

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 13,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "list-billing",
        "_type" : "_doc",
        "_id" : "AR-14",
        "_score" : null,
        "_source" : {
          "id" :,
          "quote_number" : "",
          "project_number" : "",
          "project_title" : "71008  - ST. PETERSBURG 525 Auto Scrub Burnish",
          "customer_id" :,
          "customer_name" : "",
          "job_id" : null,
          "job_number" : 71008,
          "job_description" : "ST. PETERSBURG 525",
          "status" : 0,
          "export_date" : null,
          "billing_due_date" : null,
          "workticket_start_date" : "2021-09-30T04:00:00.000000Z",
        },
        "sort" : [
          1632974400000
        ]
      },
    ]
  }
}

However, if I make the exact same query using the PHP client, I get 2 mismatched hits in with all the good stuff (some data stripped)

    {
        "_index": "list-billing",
        "_type": "_doc",
        "_id": "AR-14072",
        "_score": null,
        "_source": {
            "id":,
            "quote_number": "",
            "project_number": "",
            "project_title": "71008  - ST. PETERSBURG 525 Auto Scrub Burnish",
            "customer_id":,
            "customer_name": "",
            "job_id": null,
            "job_number": 71008,
            "job_description": "ST. PETERSBURG 525",
            "status": 4,
            "export_date": "2021-07-13T04:00:00.000000Z",
            "billing_due_date": "2021-07-08T04:00:00.000000Z",
            "workticket_start_date": "2021-09-30T04:00:00.000000Z",
        },
        "sort": [
            1632974400000
        ]
    },
    {
        "_index": "list-billing",
        "_type": "_doc",
        "_id": "AR-14073",
        "_score": null,
        "_source": {
            "id":,
            "quote_number": "",
            "project_number": "",
            "project_title": "71008  - ST. PETERSBURG 525 Auto Scrub Burnish",
            "customer_id": ,
            "customer_name": "",
            "job_id": null,
            "job_number": 71008,
            "job_description": "ST. PETERSBURG 525",
            "status": 4,
            "export_date": "2021-07-20T04:00:00.000000Z",
            "billing_due_date": "2021-07-15T04:00:00.000000Z",
            "workticket_start_date": "2021-09-30T04:00:00.000000Z",
        },
        "sort": [
            1632974400000
        ]
    },

So I guess possibly it's happening somewhere in the PHP client or something.

@spinscale didn't notice your other questions before. I'll look at moving the must to filter, thanks for the tip!

The sorting is a functionality that is only required in a few parts of our application. As I mentioned, I use Elastic to hold indexes of IDs with sorting/filtering fields also stored in the index. This allows me to use better pagination than the SQL structure for our DB allows. We use these indexes mostly to build data-siloed lists for the user. Our user-base is mobile focused so we implement infinite scroll, but have to allow for certain sorting and ordering on the front-end. This extra parameter in the call allows me to pass supported sort columns and values through to the index so that I can get infinite scroll batches bi-directionally even with non-unique filter or sort IDs.

I call it the Jeef Method. Tongue-in-cheek of course. :upside_down_face:

if it works in kibana, there must be a difference with the PHP client. You can add 'explain: true' to both queries and maybe spot a difference.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.