Inconsistencies between platforms

Hello! I need help with a problem I'm having between 2 versions of ELK. These versions correspond to two different platforms that consume data from the same source.

The first image corresponds to an ELK stack 7.9, where we are seeing the sum of values of a specific field day by day.

In the second image we see the same example but on an ELK stack 7.17, and we have values that do not agree with the previous graph.

In both infrastructures the processes are the same. Same data source, same processing and same ingestion from logstash. We are not applying any filter to the visualizations, and both data views (index patterns) are the same.

For some reason that I have not been able to discover, the values differ on both platforms, when they should be identical.

I have reviewed the configurations of both clusters, both logstash, and the same ones are being implemented in both cases.

Any advice on where else I could check?

Thank you so much!!

Can you provide more context here?

You have one Logstash that sends the data to both Elasticsearch? Or you replicated everything?

Is this one index or multiple index? Is the template with the mapping exactly the same?

Can you share the result of GET your-index/_mapping in both clusters?

Hello!!

Can you share the result of GET your-index/_mapping in both groups?

Yes, of course, but the mappings are extensive and exceed the character limit supported by the platform. Any option to send them?

Thank you!

You can use a gist... for the mappings....

Really want to make sure that the mappings for that field are the same.

The first thing I would do is run a count aggregation for each of those days and make sure have the exact same number of documents with that field available.

I would also run Min and Max aggregations etc. and compare

Are you also sure The time zones that are being displayed in Kibana are the same?

Excellent, I leave the link to both mappings.
If you have problems viewing them please let me know.

The first thing I would do is run a count aggregation for each of those days and make sure have the exact same number of documents with that field available.

I would also run Min and Max aggregations etc. and compare

I did the relevant verifications, but I am going to redo everything to upload the results here and be able to evaluate them as a whole.

Are you also sure The time zones that are being displayed in Kibana are the same?

Yes, in both cases it uses the GTM timezone

Thank you so much!

Which field is the sum on?

Also I would write a simple DSL query with the count, sum, min, max aggregations and an run it on both and share the results

I just created two indices, es7-17 and es7-9 with your mapping to better compare them and it detected a couple of mapping conflicts.

Some of those conflicts, depending on the value of the field in the document, would lead to Elasticsearch reject the document.

For example, in this case:

These are the conflicts:

Technical_Staging
- 7.9: boolan
- 7.17: text

tags/Technical.drp
- 7.9: text
- 7.17: boolean

tags/Technical.staging
- 7.9: boolean
- 7.17: text

tags/X-Container-Meta-Osb-CreateTime
- 7.9: text
- 7.17: float

tags/X-Container-Meta-Osb-JoinTime
- 7.9: text
- 7.17: float

tags/anual
- 7.9: text
- 7.17: float

tags/mensual
- 7.9: text
- 7.17: float

tags/useStoreCentral
- 7.9: text
- 7.17: boolean

I don't think this is really the issue, but this is a hint because not everything is equal, the mapping is different and this kind of mapping difference can lead to documents being dropped.

Can you check on your data if one of those fields with conflict have some value that could lead to them being dropped?

Also, do you use custom ids? If yes, can you track some document that should be in both cluster, but is missing in one of them?

If you do not use custom ids, can you find a smaller time window, like 10 or 5 minutes range, where you also have a difference on the number of documents and export the data from both cluster to find some that are missing to help track what could be the issue?

3 Likes

Hello! Thanks for the reply.
Execute the following query on both ELKs, taking the range from December 24 to 26, 2023:

POST /index/_search?size=0
{
   "query": {
     "bool": {
       "must": [
         {
           "range": {
             "Date": {
               "gte": "2023-12-24T00:00:00.000Z",
               "lte": "2023-12-26T23:59:00.000Z"
             }
           }
         },
         {
           "term": {
             "SubscriptionId": "7132887"
           }
         }
       ]
     }
   },
   "aggs": {
     "count": {
       "value_count": {
         "field": "Date"
       }
     },
     "sum": {
       "sum": {
         "field": "Cost_Local_Regular"
       }
     },
     "min": {
       "min": {
         "field": "Cost_Local_Regular"
       }
     },
     "max": {
       "max": {
         "field": "Cost_Local_Regular"
       }
     }
   }
}

And the results were the following:

ELK 7.9:
{
   "took" : 2520,
   "timed_out" : false,
   "_shards" : {
     "total" : 35,
     "successful" : 35,
     "skipped" : 0.
     "failed" : 0
   },
   "hits" : {
     "total" : {
       "value" : 10000,
       "relation" : "gte"
     },
     "max_score" : null,
     "hits": [ ]
   },
   "aggregations" : {
     "min" : {
       "value" : -0.8250763416290283
     },
     "max" : {
       "value" : 20039.84765625
     },
     "count" : {
       "value" : 2444188
     },
     "sum" : {
       "value" : 8106174.651495968
     }
   }
}
ELK 7.17:
{
   "took" : 736,
   "timed_out" : false,
   "_shards" : {
     "total" : 2,
     "successful" : 2,
     "skipped" : 0.
     "failed" : 0
   },
   "hits" : {
     "total" : {
       "value" : 10000,
       "relation" : "gte"
     },
     "max_score" : null,
     "hits": [ ]
   },
   "aggregations" : {
     "min" : {
       "value" : -0.8250763416290283
     },
     "max" : {
       "value" : 20039.84765625
     },
     "count" : {
       "value" : 2370761
     },
     "sum" : {
       "value" : 7895332.466686378
     }
   }
}

I find the difference in count and sum very curious, since in min and max the values are the same in both cases.

That just means that the two documents containing the min and max values are not the ones missing from the 7.17 index. Given the large number of documents it is basically expected.

Narrow down the time interval so you get to a more manageable result set, allowing you to identify some documents that are missing. When you have this, look at the fields where mapping differ between the indices and try to identify what would cause these documents to fail.

Most fields can be stored as text, so look at the fields where the mapping in 7.17 is more restrictive:

tags/Technical.drp
- 7.9: text
- 7.17: boolean

tags/X-Container-Meta-Osb-CreateTime
- 7.9: text
- 7.17: float

tags/X-Container-Meta-Osb-JoinTime
- 7.9: text
- 7.17: float

tags/anual
- 7.9: text
- 7.17: float

tags/mensual
- 7.9: text
- 7.17: float

tags/useStoreCentral
- 7.9: text
- 7.17: boolean

Hello, yes, I share the requested information with you. Thank you so much!

Can you check on your data if one of those fields with conflict have some value that could lead to them being dropped?

ELK-7.9
{
  "took" : 1980,
  "timed_out" : false,
  "_shards" : {
    "total" : 35,
    "successful" : 35,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 2.0,
    "hits" : [
      {
        "_index" : "index-7.9",
        "_type" : "_doc",
        "_id" : "6WDpnYwBVE7Krp-OrjkE",
        "_score" : 2.0,
        "_source" : {
          "Technical_Staging" : "false",
          "tags/mensual" : "null",
          "tags/anual" : "null",
          "tags/X-Container-Meta-Osb-CreateTime" : "null",
          "tags/X-Container-Meta-Osb-JoinTime" : "null",
          "tags/useStoreCentral" : "null",
          "tags/Technical.drp" : "null"
        }
      }
    ]
  }
}
ELK-7.17
{
  "took" : 531,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 2.0,
    "hits" : [
      {
        "_index" : "index-7.17",
        "_type" : "_doc",
        "_id" : "RzI8vYwBfzjXarh6aSlC",
        "_score" : 2.0,
        "_source" : {
          "Technical_Staging" : "true",
          "tags/Technical.drp" : false
        }
      }
    ]
  }
}

In the case of version 7.17 we previously filtered with logstash the fields whose value is 'null', so we do not obtain the same number of fields in the query.

Also, do you use custom ids? If yes, can you track some document that should be in both cluster, but is missing in one of them?

Unfortunately, the documents do not contain fields that allow us to identify or differentiate them, since they are all included in a Subscription Id, Tenant Id and Container Id, with 2 values for each case depending on the origin of the data.

It is not clear what you mean with that, can you provide a little more context?

Your mapping is different on 7.17 and 7.9 and some of those mapping differences can lead to documents being dropped on 7.17.

For example, all documents where those fields have these values would be rejected on 7.17:

          "tags/mensual" : "null",
          "tags/anual" : "null",
          "tags/X-Container-Meta-Osb-CreateTime" : "null",
          "tags/X-Container-Meta-Osb-JoinTime" : "null",
          "tags/useStoreCentral" : "null",
          "tags/Technical.drp" : "null"

There is a difference between null and "null", the first one is really a null value, the second one is a string with the value "null".

If in your documents you have null between double quotes, then you have a string, and the document will be rejected if the mapping is float or boolean, since you are using logstash this will also be logged on the logstash log.

You can test it yourself, first create an index with a float and boolean field:

PUT my-test-index

PUT /my-test-index/_mapping
{
  "properties": {
    "fieldA": {
      "type": "float"
    },
    "fieldB": {
      "type": "boolean"
    }
  }
}

Then try to add the following documents:

POST my-test-index/_doc/
{
  "fieldA": null
}

POST my-test-index/_doc/
{
  "fieldA": "null"
}

POST my-test-index/_doc/
{
  "fieldB": null
}

POST my-test-index/_doc/
{
  "fieldB": "null"
}

You will see that the documents with "null" as the value will be rejected with a mapping error.

Also, a null value is shown without double quotes when you search it on Elasticsearch:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my-test-index",
        "_id": "Xi_1z4wBIqOb1VE3lZvb",
        "_score": 1,
        "_source": {
          "fieldA": null,
          "fieldB": null
        }
      }
    ]
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.