Errors after installing X-Pack

x10Corey · October 27, 2016, 9:16pm

I upgraded to ES5 and installed X-Pack, however I seem to be having some issues with it. I have 2 clients, 3 masters, and 5 data nodes. On all 10 servers it is now outputting the below error in the ES log multiple times per minute. Monitoring also seems to be having gaps in the data including reporting incorrect information like no shards on a data node, then in a few minutes it shows 300 shards. I'm assuming it's due to these errors but I'm not sure where to look. I installed the X-pack plugin on all nodes (client, master, data) and Kibana.

[2016-10-27T17:09:35,697][ERROR][o.e.x.m.AgentService ] [esc2-client] exception when exporting documents
org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulks
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.doFlush(ExportBulk.java:148) ~[x-pack-5.0.0.jar:5.0.0]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk.close(ExportBulk.java:77) ~[x-pack-5.0.0.jar:5.0.0]
at org.elasticsearch.xpack.monitoring.exporter.Exporters.export(Exporters.java:194) ~[x-pack-5.0.0.jar:5.0.0]
at org.elasticsearch.xpack.monitoring.AgentService$ExportingWorker.run(AgentService.java:208) [x-pack-5.0.0.jar:5.0.0]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_65]
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulk [default_local]
at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.doFlush(LocalBulk.java:114) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk.flush(ExportBulk.java:62) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.doFlush(ExportBulk.java:145) ~[?:?]
... 4 more
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: bulk [default_local] reports failures when exporting documents
at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:121) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.doFlush(LocalBulk.java:111) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk.flush(ExportBulk.java:62) ~[?:?]
at org.elasticsearch.xpack.monitoring.exporter.ExportBulk$Compound.doFlush(ExportBulk.java:145) ~[?:?]
... 4 more

tsullivan · October 28, 2016, 11:03pm

By default, the monitoring feature of X-Pack for Elasticsearch will have the metrics indexed to the local cluster itself. But it looks like in your cluster, the bulk queue is sometimes full and the monitoring agent can't index the data.

I am assuming you don't have a custom configuration for Monitoring for your Elasticsearch nodes, and that the monitoring data is getting indexed into your production cluster (look for .monitoring-* indices). I would recommend setting up a dedicated monitoring cluster. You would be better off not having the monitoring data on your production cluster, because if your cluster goes down, then you will have lost all the metrics that can help you understand the issue.

https://www.elastic.co/guide/en/x-pack/current/monitoring-cluster.html

tanguy · November 3, 2016, 8:08am

@x10Corey There should be more information in the log files, can you please double-check and copy/paste the whole stack trace here (or in a gist)? Thanks

eric.bourlon · January 11, 2017, 3:44pm

For us this error came after the installation of elasticsearch 5.1.1 together with xpack.
But the error was also mentioning that there were no ingest node in the cluster.
Putting node.ingest:true in elasticsearch.yml made the error disappear.
It looks that monitoring is using ingest feature?

nanshan · January 17, 2017, 6:56pm

facing the same issue. by adding node.ingest; true is not helping my case

nanshan · January 17, 2017, 6:57pm

Have you solved this issue?

pickypg · January 17, 2017, 7:17pm

For those that are getting the error because they have disabled ingest nodes, there are two options:

Enable ingest node.
Disable the use of ingest on the exporter that you are using.
- https://www.elastic.co/guide/en/x-pack/current/monitoring-settings.html#local-exporter-settings

If you are not defining an exporter, then you are using the default local exporter. You can override the default by setting

xpack.monitoring.exporters.my_local:
  type: local
  use_ingest: false

pickypg · January 17, 2017, 7:21pm

@nanshan

I find that users having issues with the default exporter usually do so because they are using a template that interferes with the .monitoring-* templates. This is usually from some sort of global template (where "template": "*"), which functionally changes the index pattern for Monitoring indices in an incompatible way.

Can you show the index definition for .monitoring-data-2?

curl -XGET localhost:9200/.monitoring-data-2?pretty

nanshan · January 17, 2017, 7:23pm

{
  ".monitoring-data-2" : {
    "aliases" : { },
    "mappings" : {
      "kibana" : {
        "enabled" : false
      },
      "cluster_info" : {
        "enabled" : false,
        "_meta" : {
          "xpack.version" : "5.1.1"
        }
      },
      "node" : {
        "enabled" : false
      }
    },
    "settings" : {
      "index" : {
        "codec" : "best_compression",
        "number_of_shards" : "1",
        "provided_name" : ".monitoring-data-2",
        "mapper" : {
          "dynamic" : "false"
        },
        "creation_date" : "1484680886202",
        "number_of_replicas" : "1",
        "uuid" : "_3zErolzQ5qKx7DZ56slkQ",
        "version" : {
          "created" : "5010199"
        }
      }
    }
  }
}

pickypg · January 17, 2017, 7:24pm

@nanshan

So it's not that one. Let's try .monitoring-es-2-2017.01.17 instead of .monitoring-data-2.

Also, can you share the full error message you're getting?

nanshan · January 17, 2017, 7:26pm

I could not paste all of the logs since it is too long

 {
 ".monitoring-es-2-2017.01.17" : {
    "aliases" : { },
    "mappings" : {
      "node" : {
        "_all" : {
          "enabled" : false
        },
        "date_detection" : false,
        "properties" : {
          "cluster_uuid" : {
            "type" : "keyword"
          },
          "node" : {
            "properties" : {
              "id" : {
                "type" : "keyword"
              }
            }
          },
          "source_node" : {
            "properties" : {
              "attributes" : {
                "dynamic" : "true",
                "properties" : {
                  "client" : {
                    "type" : "boolean"
                  },
                  "data" : {
                    "type" : "boolean"
                  },
                  "master" : {
                    "type" : "boolean"
                  }
                }
              },
              "host" : {
                "type" : "keyword"
              },
              "ip" : {
                "type" : "keyword"
              },
              "name" : {
                "type" : "keyword"
              },
              "transport_address" : {
                "type" : "keyword"
              },
              "uuid" : {
                "type" : "keyword"
              }
            }
          },
          "state_uuid" : {
            "type" : "keyword"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "date_time"
          }
        }
      },
      "cluster_state" : {
        "_all" : {
          "enabled" : false
        },
        "date_detection" : false,
        "properties" : {
          "cluster_state" : {
            "properties" : {
              "master_node" : {
                "type" : "keyword"
              },
              "nodes" : {
                "type" : "object",
                "enabled" : false
              },
              "shards" : {
                "type" : "object"
              },
              "state_uuid" : {
                "type" : "keyword"
              },
              "status" : {
                "type" : "keyword"
              },
              "version" : {
                "type" : "long"
              }
            }
          },
          "cluster_uuid" : {
            "type" : "keyword"
          },
          "source_node" : {
            "properties" : {
              "attributes" : {
                "dynamic" : "true",
                "properties" : {
                  "client" : {
                    "type" : "boolean"
                  },
                  "data" : {
                    "type" : "boolean"
                  },
                  "master" : {
                    "type" : "boolean"
                  }
                }
              },
              "host" : {
                "type" : "keyword"
              },
              "ip" : {
                "type" : "keyword"
              },
              "name" : {
                "type" : "keyword"
              },
              "transport_address" : {
                "type" : "keyword"
              },
              "uuid" : {
                "type" : "keyword"
              }
            }
          },
          "timestamp" : {
            "type" : "date",
            "format" : "date_time"
          }
        }
      },

pickypg · January 17, 2017, 7:34pm

From what I can see, that looks proper. Which leaves two things:

Copy the full stacktrace of the error.
Let's see that index's settings:

curl -XGET localhost:9200/.monitoring-es-2-2017.01.17/_settings?pretty


Thanks

nanshan · January 17, 2017, 7:35pm

 {
  ".monitoring-es-2-2017.01.17" : {
    "settings" : {
      "index" : {
        "codec" : "best_compression",
        "number_of_shards" : "1",
        "provided_name" : ".monitoring-es-2-2017.01.17",
        "mapper" : {
          "dynamic" : "false"
        },
        "creation_date" : "1484680875998",
        "number_of_replicas" : "1",
        "uuid" : "kVzR3shcQgCBIMBACpDFSg",
        "version" : {
          "created" : "5010199"
        }
      }
    }
  }
}

pickypg · January 17, 2017, 7:35pm

Not that. That looks normal. Let's see that juicy error now.

nanshan · January 17, 2017, 7:39pm

I am trying to fix the error: All shards failed for phase: [query_fetch]

Sounds like I need to delete the translog t fix the above error

I will come back for this error later. thanks for all the help.

pickypg · January 17, 2017, 7:41pm

That's not the same error. The error that this issue is dealing with is, at a high level, the failure of the Monitoring code to bulk index documents into the Monitoring cluster.

You are failing to query them, which is the other side. That's a worthwhile issue, but when you do come back for the error, please create a new discuss issue for that (for better discovery for others) and include the error message as well as versions of the stack that are installed.

nanshan · January 17, 2017, 7:42pm

I will. thanks for all the input

pickypg · January 17, 2017, 7:43pm

Please tag me in it as well ("@pickypg").

nanshan · January 17, 2017, 8:49pm

I can not reproduce the issue now, here is my elasticsearch.yml:

network.host: 0.0.0.0
node.master: true
node.data: false
node.ingest: true
node.name: dev-ore-elasticsearch-master-i-abcdedf
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.unicast.hosts: ["10.0.2.9", "10.0.0.20", "10.0.1.1"]
xpack.security.enabled: false
xpack.graph.enabled: false
xpack.watcher.enabled: false

Based on the doc : https://www.elastic.co/guide/en/elasticsearch/reference/5.1/ingest.html
looks like node.ingest is enabled by default which is node.ingest: true. We should not need to add it here.

elasticsearch: 5.1.1
x-pack: 5.1.1
kibana: 5.1.1

And another question here:
I am using 3 private ips of the dedicated ElasticSearch master as the value of discovery.zen.ping.unicast.hosts

i am trying to use an ELB as the value, it threw an error: [SERVICE_UNAVAILABLE/2/no master]

@pickypg

pickypg · January 17, 2017, 10:33pm

Fortunately, this is a simple configuration issue. SERVICE_UNAVAILABLE/2/no master indicates that you did not have an elected master node in charge of your cluster when you sent your request. The issue appears to be that you only have 3 eligible master nodes (["10.0.2.9", "10.0.0.20", "10.0.1.1"]). However, your setting for discovery.zen.minimum_master_nodes is strict and set to 3.

This should be set to (M / 2) + 1, always rounded down. Therefore this should be set to 2 if you only have three 3 eligible master nodes. If you set it to 3, then any hiccup (or rolling restart for that matter) means that no master node can be elected.

Topic		Replies	Views
Cluster setup and monitoring with xpack in kibana - ELK 5.0 Elasticsearch	4	1795	February 28, 2017
Getting below error after X-Pack installation Elasticsearch	4	1110	September 11, 2017
Monitoring execution failed Elasticsearch	3	9762	January 9, 2018
X-pack installation with logstash Logstash	13	2240	July 18, 2017
Failed to load plugin class [org.elasticsearch.xpack.XPackPlugin] Elasticsearch	8	2062	February 15, 2017

Errors after installing X-Pack

Related topics