Getting unusual data during aggregation and curation in Elasticsearch

Hi,
I am using Elasticsearch 7.4. I noticed some unexpected scenarios during data aggregation and data curation.
Let me explain the scenario. I have 100 user_id and among them are AA, AB, AC, and AD. Now if I try to aggregate those user_id, I get some more extra data, such as BA, and BB.

Please note those user_id's start with A and B, they live in Cluster 1. And this issue arises when I started to migrate all data starting with A in the Cluster 2.

The query I was using

GET sturdent_data-2022*/_search
{
  "size": 0, 
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "user_id": "AA"
          }
        },
        {
          "match": {
            "user_id": "AB"
          }
        },
        {
          "match": {
            "user_id": "AC"
          }
        },
        {
          "match": {
            "user_id": "AD"
          }
        }      ]
    }
  },
  "aggs": {
    "NAME": {
      "terms": {
        "field": "user_id.keyword",
        "size": 100
      }
    }
  }
}

The output of the query is

  "aggregations" : {
    "NAME" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "AA",
          "doc_count" : 98
        },
        {
          "key" : "AB",
          "doc_count" : 74
        },
        {
          "key" : "BA",
          "doc_count" : 68
        },
        {
          "key" : "AD",
          "doc_count" : 54
        },
        {
          "key" : "AC",
          "doc_count" : 35
        },
        {
          "key" : "BB",
          "doc_count" : 12
        }
      ]
    }
  }

In the output, I received user_id BA and BB as extra unexpected data.
The same thing happens during the curation using _delete_by_query.
I found a solution during aggregation and that is use the keyword during data searching.

 {
          "match": {
            "user_id.keyword": "AD"
 }

But this does not work during curation, it removes all the data.

Now I need to get rid of these unexpected data during searching or data curation.

Elasticsearch 7.4 is EOL and no longer supported. Please upgrade ASAP.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.