Errors after installing X-Pack

I upgraded to ES5 and installed X-Pack, however I seem to be having some issues with it. I have 2 clients, 3 masters, and 5 data nodes. On all 10 servers it is now outputting the below error in the ES log multiple times per minute. Monitoring also seems to be having gaps in the data including reporting incorrect information like no shards on a data node, then in a few minutes it shows 300 shards. I'm assuming it's due to these errors but I'm not sure where to look. I installed the X-pack plugin on all nodes (client, master, data) and Kibana.

[2016-10-27T17:09:35,697][ERROR][o.e.x.m.AgentService ] [esc2-client] exception when exporting documents
org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulks
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.doFlush(ExportBulk.java:148) ~[x-pack-5.0.0.jar:5.0.0]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk.close(ExportBulk.java:77) ~[x-pack-5.0.0.jar:5.0.0]
at org.elasticsearch.xpack.monitoring.exporter.Exporters.export(Exporters.java:194) ~[x-pack-5.0.0.jar:5.0.0]
at org.elasticsearch.xpack.monitoring.AgentService$ExportingWorker.run(AgentService.java:208) [x-pack-5.0.0.jar:5.0.0]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_65]
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulk [default_local]
at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.doFlush(LocalBulk.java:114) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk.flush(ExportBulk.java:62) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.doFlush(ExportBulk.java:145) ~[?:?]
... 4 more
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: bulk [default_local] reports failures when exporting documents
at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:121) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.doFlush(LocalBulk.java:111) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk.flush(ExportBulk.java:62) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.doFlush(ExportBulk.java:145) ~[?:?]
... 4 more

By default, the monitoring feature of X-Pack for Elasticsearch will have the metrics indexed to the local cluster itself. But it looks like in your cluster, the bulk queue is sometimes full and the monitoring agent can't index the data.

I am assuming you don't have a custom configuration for Monitoring for your Elasticsearch nodes, and that the monitoring data is getting indexed into your production cluster (look for .monitoring-* indices). I would recommend setting up a dedicated monitoring cluster. You would be better off not having the monitoring data on your production cluster, because if your cluster goes down, then you will have lost all the metrics that can help you understand the issue.

https://www.elastic.co/guide/en/x-pack/current/monitoring-cluster.html

@x10Corey There should be more information in the log files, can you please double-check and copy/paste the whole stack trace here (or in a gist)? Thanks

For us this error came after the installation of elasticsearch 5.1.1 together with xpack.
But the error was also mentioning that there were no ingest node in the cluster.
Putting node.ingest:true in elasticsearch.yml made the error disappear.
It looks that monitoring is using ingest feature?

facing the same issue. by adding node.ingest; true is not helping my case

Have you solved this issue?

For those that are getting the error because they have disabled ingest nodes, there are two options:

  1. Enable ingest node.
  2. Disable the use of ingest on the exporter that you are using.

If you are not defining an exporter, then you are using the default local exporter. You can override the default by setting

xpack.monitoring.exporters.my_local:
  type: local
  use_ingest: false

@nanshan

I find that users having issues with the default exporter usually do so because they are using a template that interferes with the .monitoring-* templates. This is usually from some sort of global template (where "template": "*"), which functionally changes the index pattern for Monitoring indices in an incompatible way.

Can you show the index definition for .monitoring-data-2?

curl -XGET localhost:9200/.monitoring-data-2?pretty
{
  ".monitoring-data-2" : {
    "aliases" : { },
    "mappings" : {
      "kibana" : {
        "enabled" : false
      },
      "cluster_info" : {
        "enabled" : false,
        "_meta" : {
          "xpack.version" : "5.1.1"
        }
      },
      "node" : {
        "enabled" : false
      }
    },
    "settings" : {
      "index" : {
        "codec" : "best_compression",
        "number_of_shards" : "1",
        "provided_name" : ".monitoring-data-2",
        "mapper" : {
          "dynamic" : "false"
        },
        "creation_date" : "1484680886202",
        "number_of_replicas" : "1",
        "uuid" : "_3zErolzQ5qKx7DZ56slkQ",
        "version" : {
          "created" : "5010199"
        }
      }
    }
  }
}

@nanshan

So it's not that one. Let's try .monitoring-es-2-2017.01.17 instead of .monitoring-data-2.

Also, can you share the full error message you're getting?

I could not paste all of the logs since it is too long

 {
 ".monitoring-es-2-2017.01.17" : {
    "aliases" : { },
    "mappings" : {
      "node" : {
        "_all" : {
          "enabled" : false
        },
        "date_detection" : false,
        "properties" : {
          "cluster_uuid" : {
            "type" : "keyword"
          },
          "node" : {
            "properties" : {
              "id" : {
                "type" : "keyword"
              }
            }
          },
          "source_node" : {
            "properties" : {
              "attributes" : {
                "dynamic" : "true",
                "properties" : {
                  "client" : {
                    "type" : "boolean"
                  },
                  "data" : {
                    "type" : "boolean"
                  },
                  "master" : {
                    "type" : "boolean"
                  }
                }
              },
              "host" : {
                "type" : "keyword"
              },
              "ip" : {
                "type" : "keyword"
              },
              "name" : {
                "type" : "keyword"
              },
              "transport_address" : {
                "type" : "keyword"
              },
              "uuid" : {
                "type" : "keyword"
              }
            }
          },
          "state_uuid" : {
            "type" : "keyword"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "date_time"
          }
        }
      },
      "cluster_state" : {
        "_all" : {
          "enabled" : false
        },
        "date_detection" : false,
        "properties" : {
          "cluster_state" : {
            "properties" : {
              "master_node" : {
                "type" : "keyword"
              },
              "nodes" : {
                "type" : "object",
                "enabled" : false
              },
              "shards" : {
                "type" : "object"
              },
              "state_uuid" : {
                "type" : "keyword"
              },
              "status" : {
                "type" : "keyword"
              },
              "version" : {
                "type" : "long"
              }
            }
          },
          "cluster_uuid" : {
            "type" : "keyword"
          },
          "source_node" : {
            "properties" : {
              "attributes" : {
                "dynamic" : "true",
                "properties" : {
                  "client" : {
                    "type" : "boolean"
                  },
                  "data" : {
                    "type" : "boolean"
                  },
                  "master" : {
                    "type" : "boolean"
                  }
                }
              },
              "host" : {
                "type" : "keyword"
              },
              "ip" : {
                "type" : "keyword"
              },
              "name" : {
                "type" : "keyword"
              },
              "transport_address" : {
                "type" : "keyword"
              },
              "uuid" : {
                "type" : "keyword"
              }
            }
          },
          "timestamp" : {
            "type" : "date",
            "format" : "date_time"
          }
        }
      },

From what I can see, that looks proper. Which leaves two things:

  1. Copy the full stacktrace of the error.
  2. Let's see that index's settings:

curl -XGET localhost:9200/.monitoring-es-2-2017.01.17/_settings?pretty


Thanks
 {
  ".monitoring-es-2-2017.01.17" : {
    "settings" : {
      "index" : {
        "codec" : "best_compression",
        "number_of_shards" : "1",
        "provided_name" : ".monitoring-es-2-2017.01.17",
        "mapper" : {
          "dynamic" : "false"
        },
        "creation_date" : "1484680875998",
        "number_of_replicas" : "1",
        "uuid" : "kVzR3shcQgCBIMBACpDFSg",
        "version" : {
          "created" : "5010199"
        }
      }
    }
  }
}

Not that. That looks normal. Let's see that juicy error now. :slight_smile:

I am trying to fix the error: All shards failed for phase: [query_fetch]

Sounds like I need to delete the translog t fix the above error

I will come back for this error later. thanks for all the help.

That's not the same error. The error that this issue is dealing with is, at a high level, the failure of the Monitoring code to bulk index documents into the Monitoring cluster.

You are failing to query them, which is the other side. That's a worthwhile issue, but when you do come back for the error, please create a new discuss issue for that (for better discovery for others) and include the error message as well as versions of the stack that are installed.

I will. thanks for all the input

1 Like

Please tag me in it as well ("@pickypg").

I can not reproduce the issue now, here is my elasticsearch.yml:

network.host: 0.0.0.0
node.master: true
node.data: false
node.ingest: true
node.name: dev-ore-elasticsearch-master-i-abcdedf
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.unicast.hosts: ["10.0.2.9", "10.0.0.20", "10.0.1.1"]
xpack.security.enabled: false
xpack.graph.enabled: false
xpack.watcher.enabled: false

Based on the doc : https://www.elastic.co/guide/en/elasticsearch/reference/5.1/ingest.html
looks like node.ingest is enabled by default which is node.ingest: true. We should not need to add it here.

elasticsearch: 5.1.1
x-pack: 5.1.1
kibana: 5.1.1

And another question here:
I am using 3 private ips of the dedicated ElasticSearch master as the value of discovery.zen.ping.unicast.hosts

i am trying to use an ELB as the value, it threw an error: [SERVICE_UNAVAILABLE/2/no master]

@pickypg

1 Like

Fortunately, this is a simple configuration issue. SERVICE_UNAVAILABLE/2/no master indicates that you did not have an elected master node in charge of your cluster when you sent your request. The issue appears to be that you only have 3 eligible master nodes (["10.0.2.9", "10.0.0.20", "10.0.1.1"]). However, your setting for discovery.zen.minimum_master_nodes is strict and set to 3.

This should be set to (M / 2) + 1, always rounded down. Therefore this should be set to 2 if you only have three 3 eligible master nodes. If you set it to 3, then any hiccup (or rolling restart for that matter) means that no master node can be elected.