Index randomly fetches wrong results and wrong numeric type

ewolfman · January 22, 2021, 9:41am

Hi,

I have a situation I have not seen before... In my test index I have just 25 records.
I am noticing the following inconsistent problems upon a simple search request to the alias (which is pointing to this single index):

GET /private_95787e92-ea54-412f-b2aa-84f136597e13_companys_active/_search
{
  "_source": "score"
}

When all works as expected, I get total hits 25 and the Integer field is returned correctly as an integer.

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 25,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "8834d971-2275-46ea-85b7-665a17532473",
        "_score" : 1.0,
        "_source" : { }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "99322c97-d41c-411a-ae4e-1ac7f91ed1da",
        "_score" : 1.0,
        "_source" : {
          "score" : 522
        }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "45690436-2a6c-4edf-a79b-87c934565427",
        "_score" : 1.0,
        "_source" : {
          "score" : 100
        }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "ce562ba5-e472-4713-ad35-88eac62de2f1",
        "_score" : 1.0,
        "_source" : { }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "a2405408-f02b-4780-a7b6-7d1559577ff8",
        "_score" : 1.0,
        "_source" : {
          "score" : 0
        }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "607afd54-099b-4330-a871-542676236cff",
        "_score" : 1.0,
        "_source" : { }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "af5c053a-f344-418f-8650-cc2c60f5525b",
        "_score" : 1.0,
        "_source" : {
          "score" : 522
        }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "576e513b-4b1c-4eac-bbfe-6a94fea2cea4",
        "_score" : 1.0,
        "_source" : {
          "score" : 522
        }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "36bb9722-b891-4452-b2eb-904ee78e72ea",
        "_score" : 1.0,
        "_source" : { }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "058ba128-2c6f-4969-be29-946deab0165b",
        "_score" : 1.0,
        "_source" : { }
      }
    ]
  }
}

However randomly I get total hits of just 7 and the Integer field is returned as double.

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "91e446fb-10ad-48bc-a1e9-7fb11867758c",
        "_score" : 1.0,
        "_source" : {
          "score" : 380.0
        }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "0cb40954-606f-42fd-90b8-de296ee7a2ba",
        "_score" : 1.0,
        "_source" : {
          "score" : 110.0
        }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "a348a3ce-21f5-41a5-9154-100bfa38e8c9",
        "_score" : 1.0,
        "_source" : {
          "score" : 100.0
        }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "bd20a90a-d530-4827-9f1d-109e79251dcc",
        "_score" : 1.0,
        "_source" : {
          "score" : 750.0
        }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "625d7d75-25de-4252-b215-7da324d69cad",
        "_score" : 1.0,
        "_source" : {
          "score" : 550.0
        }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "652be0af-3307-4403-9e81-810faf1bbc82",
        "_score" : 1.0,
        "_source" : {
          "score" : 770.0
        }
      },
      {
        "_index" : "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001",
        "_type" : "_doc",
        "_id" : "addabf74-430b-423c-be53-02b4de6ec3c9",
        "_score" : 1.0,
        "_source" : {
          "score" : 750.0
        }
      }
    ]
  }
}

Partial mapping:

{
  "private_95787e92-ea54-412f-b2aa-84f136597e13_companys_new-000001" : {
    "mappings" : {
      // left out for clarity
      "properties" : {

        "score" : {
          "type" : "integer"
        },
      }
      // left out for clarity
    }
  }
}

Some more info: at times I notice that I get:
{"statusCode":502,"error":"Bad Gateway","message":"Client request timeout"}

And also I know that one of the nodes is in the low watermark.
(Note that the index mentioned here is different.)

GET _cluster/allocation/explain

{
  "index" : "private_123eafdc-b1d0-4a38-8766-e9689d037efd_companys_new-000001",
  "shard" : 2,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2021-01-10T12:36:54.115Z",
    "details" : "node_left [TZb0mpk6Rk-o_ZGl1Ej1KQ]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "PMqJCXm6T-uZT0DtzFcqHg",
      "node_name" : "elasticsearch-2",
      "transport_address" : "<xxx>:9300",
      "node_attributes" : {
        "ml.machine_memory" : "2147483648",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [13.357897780780078%]"
        }
      ]
    },
    {
      "node_id" : "ZzV7yKD-ToSOqoApPn3-9A",
      "node_name" : "elasticsearch-1",
      "transport_address" : "<xxx>:9300",
      "node_attributes" : {
        "ml.machine_memory" : "2147483648",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "matching_size_in_bytes" : 5139037626
      },
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[private_123eafdc-b1d0-4a38-8766-e9689d037efd_companys_new-000001][2], node[ZzV7yKD-ToSOqoApPn3-9A], [P], s[STARTED], a[id=p3Fb2ewtSHaqwFpHjEQ44w]]"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "NO",
          "explanation" : "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=85%], using more disk space than the maximum allowed [85.0%], actual free: [8.081981702764564%]"
        }
      ]
    }
  ]
}

Can you please help me understand what is causing this and how to solve it?

ewolfman · January 23, 2021, 8:27pm

Some updates:
Comparing the results from when I received 25 results with score of integer type records (good) and the 7 results with score of float type records (bad), showed that none of the 7 records exist within the 25 records. In other words - those 7 records are coming from some other place. The only possibility I can think of was that those records belong to an index by the same name that I had deleted as the type of the score had changed from float to int. I don't know if this is true but this is the only assumption I have at this point.

Some of the steps I tried when troubleshooting:

Tried to reindex, hoping that recreating the index would stop showing those 7 float records - did not help.
Cleared a lot of space from other indices.
Restarted 2/3 pods which seemed broken/nonresponsive.

Only after steps 2&3 the index stopped showing those 7 records. Hopefully this problem went away.

But, if I was indeed seeing results from the deleted index, I think it is an integrity issue in Elasticsearch. I assume that the deleted index could not have been deleted at 1 or 2 of the pods because of the lacking disk space, but I would never think that they should show up as results of the new index (if this is what I was seeing).

Thoughts?

warkolm · January 24, 2021, 11:43pm

Then the cluster would not have returned an ok response to the delete request.

Seems like there is something else happening here that would be worth digging into. What do you mean by this?

ewolfman · January 25, 2021, 8:16am

Then the cluster would not have returned an ok response to the delete request.

Unfortunately I don't remember what DELETE returned, thought I don't recall any errors.

Seems like there is something else happening here that would be worth digging into. What do you mean by this?

There are 3 pods. Observing the logs for them I had seen that one pod did not show any new logs which seemed to me as if it is nonresponsive. Another pod kept restarting, logging exceptions for 'No space left'. In between restarts I had to run some manual deletion of other large indexes to clear space. Only then that second pod managed to restart properly without complaining about diskspace. After this the index stopped showing me entries with 'float' score (which might have been entries from the deleted index).

system · February 22, 2021, 8:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inconsistent query results in ES Elasticsearch	2	396	July 6, 2017
Inconsistent results with range query Elasticsearch	3	473	July 20, 2019
Indexing string as integer then getting back the integer value in search result Elasticsearch	1	469	September 26, 2017
Inconsistent results when searching within one type vs all types Elasticsearch	5	390	July 6, 2017
Inexplicable wrong results in automated tests Elasticsearch	7	360	July 6, 2017

Index randomly fetches wrong results and wrong numeric type

Related topics