Today I've upgraded our ELK Stack from 8.0 to 8.1.
Since then the metricbeat, running on the logstash node, started to log something that I wasn't able to figure out by myself with my "Google-fu".
Beside the fact that it seem to work as espected I'd like to find the root cause of this error.
Logged error:
Mar 14 17:44:33 hostname metricbeat[433]: {
"log.level":"error",
"@timestamp":"2022-03-14T17:44:33.403+0100",
"log.origin":{"file.name":"module/wrapper.go","file.line":254},
"message":"Error fetching data for metricset logstash.node: Could not find field 'id' in Logstash API response",
"service.name":"metricbeat",
"ecs.version":"1.6.0"
}
Looking at the source at the following site, it appears that something called ReportingMetricSetV2Error is the direct cause, but we don't know what it is yet (I don't know if looking at this site is the right thing to do.).
Hey,
I have same error with metricbeat and I don't see the Logstash metrics on Kibana.
However, when I remove monitoring.cluster_uuid in logstash.yml, metricbeat has still error but I do see that metrics are shown up again on Kibana.
If you don't see any connection refusal errors like this, that means you are good with Logstash connection.
{"log.level":"error","@timestamp":"2022-03-16T14:39:43.159-0700","log.origin":{"file.name":"module/wrapper.go","file.line":254},"message":"Error fetching data for metricset logstash.node: error making http request: Get \"http://localhost:9600/\": dial tcp 127.0.0.1:9600: connect: connection refused","service.name":"metricbeat","ecs.version":"1.6.0"}
Some investigation tips if you need:
make sure to check id is exist in the response of Logstash API by using these commands:
## node info
$curl -XGET 'localhost:9600/?pretty'
## Plugins info
$curl -XGET 'localhost:9600/_node/plugins?pretty'
## status
$curl -XGET 'localhost:9600/_node/stats?pretty'
## Hot threads
$curl -XGET 'localhost:9600/_node/hot_threads?pretty'
make sure you use proper cluster id if you have to use the monitoring.cluster_uuid settings. Usually it is same as Elasticsearch has.
In Elasicsearch Dev Tools, search metric logs. The index usually starts with .ds-.monitoring-logstash
I also upgraded, 7.17.1 to 8.1.2. I'm seeing errors in the metricbeat log file (whose name changed) containing
[logstash.node.stats.pipelines.queue.capacity.queue_size_in_bytes] cannot be changed from type [long] to [float]\"}, dropping event!","service.name":"metricbeat","ecs.version":"1.6.0"}
(END)
So, I think metricbeat is collecting logstash statistics but the events can't be indexed due to a mapping error.
It's a little surprising to me that something called queue_size_in_bytes would be returning a float. If it is we should probably file an issue on the logstash repo to have it appropriately adjusted to a whole byte value.
Thanks @rugenl ! Are you able to share any more of the log around the error? Is it from metricbeat trying to query the same logstash you provided to the curl from? Also is the metricbeat also 8.1.2?
Is .monitoring-logstash-8-mb a plain index on your cluster? It should be a datastream on 8 but maybe something in the upgrade order caused it to get created incorrectly.
If you can delete .monitoring-logstash-8-mb (possibly snapshot or reindex first if you want to keep the data), then it should get created using the embedded data stream template on the next incoming document. That may allow the data to get indexed properly.
Isn't the Timestamp what would be used? If it parses fields in order, the error field node.stats.pipelines.queue.capacity.queue_size_in_bytes should be past timestamp.
Yeah, that's a fair point. Metricbeat might just inject the @timestamp from beat.Event. It probably is or you'd be seeing the same timestamp error.
I still can't think what would be causing mapper [logstash.node.stats.pipelines.queue.capacity.queue_size_in_bytes] cannot be changed from type [long] to [float]
Your example doc seems to only contain null or integers for that field:
Is there anything else that might be modifying the document to present a float? Like a pipeline configured in the metricbeat output? Maybe a proxy between metricbeat and Elasticsearch?
Well, when this couldn't get any weirder, this error has gone away. I wasn't making any intentional changes to anything that could be related.
I was working on migrating from Elasticsearch-certgen to Elasticsearch-certutil. We had used ssl_verification full but the install doc didn't generate .p12 files for each node, so I had dropped back to ssl_verification certificate to get it working. I updated the certs and restarted all elastisearch.
logstash was also restarted, I'll have to check the config changes, I was in a different git branch than I had used to initially build the test stack since I was working a different issue.
This CA wouldn't have been valid, but it wasn't verified. This would have been the active config in the 7.17 config and didn't cause a problem. I wonder if it's related to the first listed breaking change?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.