Getting unusual data during aggregation and curation in Elasticsearch

Anika_Sultana · July 28, 2022, 3:41pm

Hi,
I am using Elasticsearch 7.4. I noticed some unexpected scenarios during data aggregation and data curation.
Let me explain the scenario. I have 100 user_id and among them are AA, AB, AC, and AD. Now if I try to aggregate those user_id, I get some more extra data, such as BA, and BB.

Please note those user_id's start with A and B, they live in Cluster 1. And this issue arises when I started to migrate all data starting with A in the Cluster 2.

The query I was using

GET sturdent_data-2022*/_search
{
  "size": 0, 
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "user_id": "AA"
          }
        },
        {
          "match": {
            "user_id": "AB"
          }
        },
        {
          "match": {
            "user_id": "AC"
          }
        },
        {
          "match": {
            "user_id": "AD"
          }
        }      ]
    }
  },
  "aggs": {
    "NAME": {
      "terms": {
        "field": "user_id.keyword",
        "size": 100
      }
    }
  }
}

The output of the query is

  "aggregations" : {
    "NAME" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "AA",
          "doc_count" : 98
        },
        {
          "key" : "AB",
          "doc_count" : 74
        },
        {
          "key" : "BA",
          "doc_count" : 68
        },
        {
          "key" : "AD",
          "doc_count" : 54
        },
        {
          "key" : "AC",
          "doc_count" : 35
        },
        {
          "key" : "BB",
          "doc_count" : 12
        }
      ]
    }
  }

In the output, I received user_id BA and BB as extra unexpected data.
The same thing happens during the curation using _delete_by_query.
I found a solution during aggregation and that is use the keyword during data searching.

 {
          "match": {
            "user_id.keyword": "AD"
 }

But this does not work during curation, it removes all the data.

Now I need to get rid of these unexpected data during searching or data curation.

system · July 28, 2022, 3:41pm

Elasticsearch 7.4 is EOL and no longer supported. Please upgrade ASAP.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

system · August 25, 2022, 3:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregations stop working silently on elasticsearch 1.4.4 Elasticsearch	5	1376	July 5, 2017
ElasticSearch 1.3.4 - Duplicate data sometimes Elasticsearch	2	358	July 6, 2017
Incorrect Aggregations returned from ES Elasticsearch	3	1011	July 6, 2017
"aggregations" do not work any more (index corrupt ?) - resolved Elasticsearch	10	1852	July 5, 2017
Upgrading from 1.4.x alters terms aggeration Elasticsearch	9	1855	July 6, 2017

Getting unusual data during aggregation and curation in Elasticsearch

Related topics