My_index/_stats/indexing (high number of "index_failed" operations)

I'm supporting a cluster with an index that is seeing a high number in "indexed_failed" operations as shown by the indexing stats. I'm trying to understand what type of scenarios would cause this counter to increment up. What exactly is considered a "indexed_failed" operation and what might the causes be?

Logstash indexing to > Elasticsearch 5.6.4 and seeing an excessive amount io wait time and poor performance overall.

GET my_index/_stats/indexing 
{
  "_shards" : {
    "total" : 360,
    "successful" : 360,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "indexing" : {
        "index_total" : 39546274,
        "index_time_in_millis" : 32693876,
        "index_current" : 0,
        "index_failed" : 17762193,
        "delete_total" : 1059084,
        "delete_time_in_millis" : 216423,
        "delete_current" : 0,
        "noop_update_total" : 0,
        "is_throttled" : false,
        "throttle_time_in_millis" : 0
      }
    },
    "total" : {
      "indexing" : {
        "index_total" : 108377320,
        "index_time_in_millis" : 117903886,
        "index_current" : 6,
        "index_failed" : 17762193,
        "delete_total" : 3072604,
        "delete_time_in_millis" : 965006,
        "delete_current" : 0,
        "noop_update_total" : 0,
        "is_throttled" : false,
        "throttle_time_in_millis" : 0
      }
    }
  },
  "indices" : {
    "my_index" : {
      "primaries" : {
        "indexing" : {
          "index_total" : 39546274,
          "index_time_in_millis" : 32693876,
          "index_current" : 0,
          "index_failed" : 17762193,
          "delete_total" : 1059084,
          "delete_time_in_millis" : 216423,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        }
      },
      "total" : {
        "indexing" : {
          "index_total" : 108377320,
          "index_time_in_millis" : 117903886,
          "index_current" : 6,
          "index_failed" : 17762193,
          "delete_total" : 3072604,
          "delete_time_in_millis" : 965006,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        }
      }
    }
  }
}

Welcome to our community! :smiley:

5.6 has been EOL for 2 years now, you really need to upgrade as a matter of urgency.

Why do you have so many shards?

I couldn't agree with you more that the version is old and this is a lot of shards. We're actively working on upgrading to the latest version but need to solve the problem in order to clear a path to get there. The index has a 937GB pri.store.size with 60 primary shards and (5 replicas at the moment for testing).

But what I'm trying to understand is exactly what causes the "index_failed" count to be incremented and how concerned I should be that the index_failed count is nearly 50% of the index_total count.

The exact meaning of the counter isn't documented well anywhere that I can find. It's not documented in 5.4 and the 7.x docs say that it's the "Number of failed indexing operations" Which doesn't help me to understand what causes that to happen. I don't see any errors in Logstash and we don't see to be missing any documents.

The could be failures that Logstash has automatically retried for you.

Update. I was able to confirm what was causing the the "index_failed" counts. Logstash was configured to use external version numbers but there was a flaw in our config that was causing it to use the same version number multiple times. This was causing updates in Elasticsearch to return a 409 error due to a version conflict which increments the index_failed counter in the index stats.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.