Metricbeat Cluster Monitoring: logstash.node: Could not find field 'id' in Logstash API response

Hello Community,

Today I've upgraded our ELK Stack from 8.0 to 8.1.
Since then the metricbeat, running on the logstash node, started to log something that I wasn't able to figure out by myself with my "Google-fu".

Beside the fact that it seem to work as espected I'd like to find the root cause of this error.

Logged error:

Mar 14 17:44:33 hostname metricbeat[433]: {
"log.level":"error",
"@timestamp":"2022-03-14T17:44:33.403+0100",
"log.origin":{"file.name":"module/wrapper.go","file.line":254},
"message":"Error fetching data for metricset logstash.node: Could not find field 'id' in Logstash API response",
"service.name":"metricbeat",
"ecs.version":"1.6.0"
}

Current logstash-xpack config:

- module: logstash
  xpack.enabled: true
  period: 10s
  hosts: ["http://localhost:9600"]
  username: "${monitoring_user_name}"
  password: "${monitoring_user_password}"

Best regards,
M.

1 Like

I am facing the same problem but have not reached a solution.

"log.origin":{"file.name":"module/wrapper.go","file.line":254},

Looking at the source at the following site, it appears that something called ReportingMetricSetV2Error is the direct cause, but we don't know what it is yet (I don't know if looking at this site is the right thing to do.).

Hey,
I have same error with metricbeat and I don't see the Logstash metrics on Kibana.
However, when I remove monitoring.cluster_uuid in logstash.yml, metricbeat has still error but I do see that metrics are shown up again on Kibana.

If you don't see any connection refusal errors like this, that means you are good with Logstash connection.

{"log.level":"error","@timestamp":"2022-03-16T14:39:43.159-0700","log.origin":{"file.name":"module/wrapper.go","file.line":254},"message":"Error fetching data for metricset logstash.node: error making http request: Get \"http://localhost:9600/\": dial tcp 127.0.0.1:9600: connect: connection refused","service.name":"metricbeat","ecs.version":"1.6.0"}

Some investigation tips if you need:

  • make sure to check id is exist in the response of Logstash API by using these commands:
## node info
$curl -XGET 'localhost:9600/?pretty'
## Plugins info
$curl -XGET 'localhost:9600/_node/plugins?pretty'
## status
$curl -XGET 'localhost:9600/_node/stats?pretty'
## Hot threads
$curl -XGET 'localhost:9600/_node/hot_threads?pretty'
  • make sure you use proper cluster id if you have to use the monitoring.cluster_uuid settings. Usually it is same as Elasticsearch has.
  • In Elasicsearch Dev Tools, search metric logs. The index usually starts with .ds-.monitoring-logstash

Hi,

I checked my Logstash instance with those commands.

There is an 'id' set and the monitoring.cluster_uuid is also correctly set.

Connection seems fine due to events are arriving in Elasticsearch and getting showed in "Stack Monitoring" as expected.

Regards,
M.

@marcus_lhisp
Am I correct in my understanding that metricbeat is working and connecting fine, but the error still keeps occurring?

I also upgraded, 7.17.1 to 8.1.2. I'm seeing errors in the metricbeat log file (whose name changed) containing

[logstash.node.stats.pipelines.queue.capacity.queue_size_in_bytes] cannot be changed from type [long] to [float]\"}, dropping event!","service.name":"metricbeat","ecs.version":"1.6.0"}
(END)

So, I think metricbeat is collecting logstash statistics but the events can't be indexed due to a mapping error.

The field is mapped as long in elasticsearch/monitoring-logstash-mb.json at master · elastic/elasticsearch · GitHub, as @mashhurs mentioned it'd be good to see what values you're getting back from the logstash stats endpoints.

It's a little surprising to me that something called queue_size_in_bytes would be returning a float. If it is we should probably file an issue on the logstash repo to have it appropriately adjusted to a whole byte value.

I think the logstash stats values look OK, if I did this correctly: (Some trimming done for readability)

curl -XGET 'localhost:9600/_node/stats?pretty' | grep \"queue_size_in_bytes\"
          "queue_size_in_bytes" : 62995188,
        "queue_size_in_bytes" : 62995188,
         "queue_size_in_bytes" : 1,
        "queue_size_in_bytes" : 1,
       "queue_size_in_bytes" : 0,
        "queue_size_in_bytes" : 0,
          "queue_size_in_bytes" : 1,
        "queue_size_in_bytes" : 1,
        "queue_size_in_bytes" : 0,
          "queue_size_in_bytes" : 1,
        "queue_size_in_bytes" : 1,
        "queue_size_in_bytes" : 0,
        "queue_size_in_bytes" : 0,

Thanks @rugenl ! Are you able to share any more of the log around the error? Is it from metricbeat trying to query the same logstash you provided to the curl from? Also is the metricbeat also 8.1.2?

All components in the stack are 8.1.2. I put the last 10 lines at https://pastebin.com/42vYvcgu

I also just noticed the ID error in my log also, but I don't think it's the root cause.

Error fetching data for metricset logstash.node: Could not find field 'id' in Logstash API

Thanks @rugenl - if I try to post that to an 8.1.2 test instance I get this:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "failed to parse"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "data stream timestamp field [@timestamp] is missing"
    }
  },
  "status" : 400
}

Is .monitoring-logstash-8-mb a plain index on your cluster? It should be a datastream on 8 but maybe something in the upgrade order caused it to get created incorrectly.

If you can delete .monitoring-logstash-8-mb (possibly snapshot or reindex first if you want to keep the data), then it should get created using the embedded data stream template on the next incoming document. That may allow the data to get indexed properly.

It was a data stream. Deleting it and letting it recreate didn't change anything.

Log message 9 contains:

"Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(2022, time.April, 7, 9, 6, 51, 277098961, time.Local),

Isn't the Timestamp what would be used? If it parses fields in order, the error field node.stats.pipelines.queue.capacity.queue_size_in_bytes should be past timestamp.

Yeah, that's a fair point. Metricbeat might just inject the @timestamp from beat.Event. It probably is or you'd be seeing the same timestamp error.

I still can't think what would be causing mapper [logstash.node.stats.pipelines.queue.capacity.queue_size_in_bytes] cannot be changed from type [long] to [float]

Your example doc seems to only contain null or integers for that field:

❯ jq '.logstash.node.stats.pipelines | .[].queue.capacity.queue_size_in_bytes' doc.json
1
null
1
65298372
null
1
null
null
null

Is there anything else that might be modifying the document to present a float? Like a pipeline configured in the metricbeat output? Maybe a proxy between metricbeat and Elasticsearch?

Well, when this couldn't get any weirder, this error has gone away. I wasn't making any intentional changes to anything that could be related.

I was working on migrating from Elasticsearch-certgen to Elasticsearch-certutil. We had used ssl_verification full but the install doc didn't generate .p12 files for each node, so I had dropped back to ssl_verification certificate to get it working. I updated the certs and restarted all elastisearch.

logstash was also restarted, I'll have to check the config changes, I was in a different git branch than I had used to initially build the test stack since I was working a different issue.

The change in logstash was this, in Elasticsearch output secitons:

-    cacert => "/etc/logstash/certs/https_interm.cer"
+    cacert => "/etc/logstash/elasticsearch-ca.pem"
+    ssl_certificate_verification => false

This CA wouldn't have been valid, but it wasn't verified. This would have been the active config in the 7.17 config and didn't cause a problem. I wonder if it's related to the first listed breaking change?

Well then! Happy it's working smoothly now, but kind of bitter-sweet since we didn't find the root cause.

If you see it pop up again feel free to reply here or open a new discussion with the elastic-stack-monitoring tag.

If anyone else on the thread is still seeing trouble, please reply with as much recent information (logs, curl reponses) as you can share.

Thanks @rugenl for being so responsive!

I strongly suspect that it was caused by the invalid CA, so check there first if you are reading this :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.