[Rollup Job] NumberFormatException

Whenever a new rollup job is created the first 1000 documents are found but all subsequent calls throws the following error in the Elasticsearch logs.

 shard [[DUPkSOQVQAeFTiEJ9DYBgQ][metricbeat-7.0.0-2019-04-21][0]], reason [RemoteTransportException[[GREYLOG][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: NumberFormatException[For input string: "idc03-212"]; ], cause [java.lang.NumberFormatException: For input string: "idc03-212"
    at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2054)
    at java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
    at java.base/java.lang.Double.parseDouble(Double.java:549)
    at org.elasticsearch.search.DocValueFormat$1.parseLong(DocValueFormat.java:119)
    at org.elasticsearch.search.aggregations.bucket.composite.LongValuesSource.setAfter(LongValuesSource.java:154)
    at org.elasticsearch.search.aggregations.bucket.composite.CompositeValuesCollectorQueue.<init>(CompositeValuesCollectorQueue.java:86)
    at org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregator.<init>(CompositeAggregator.java:94)
    at org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationFactory.createInternal(CompositeAggregationFactory.java:49)
    at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:217)

Recently upgraded ELK installation to 7.0.0. Have a mix of 6.6.2, 6.7.0, 6.7.1 and 7.0.0 Metricbeat indexes.

I am trying to work out why it is attempting to parse the beat hostname keyword to a number.

Anyone help with this? Not sure if a bug or something I have set up wrong.

Below is the rollup jobs JSON. Cut some of the metrics out due to body length limit.

{
  "config": {
    "id": "MetricBeat 5 Minute 0",
    "index_pattern": "metricbeat-*",
    "rollup_index": "rollup_5m_metricbeat",
    "cron": "0 5 * * * ?",
    "groups": {
      "date_histogram": {
        "interval": "5m",
        "field": "@timestamp",
        "delay": "7d",
        "time_zone": "UTC"
      },
      "terms": {
        "fields": [
          "agent.hostname",
          "beat.hostname.keyword",
          "host.hostname",
          "metricset.module.keyword",
          "metricset.name.keyword",
          "system.process.name.keyword",
          "system.filesystem.mount_point.keyword"
        ]
      }
    },
    "metrics": [
      {
        "field": "system.cpu.cores",
        "metrics": [
          "max"
        ]
      },
      {
        "field": "system.cpu.idle.norm.pct",
        "metrics": [
          "avg",
          "max",
          "min"
        ]
      },
      {
        "field": "system.cpu.idle.pct",
        "metrics": [
          "min",
          "max",
          "avg"
        ]
      },
      {
        "field": "system.cpu.iowait.norm.pct",
        "metrics": [
          "avg",
          "max",
          "min"
        ]
      },
      {
        "field": "system.cpu.iowait.pct",
        "metrics": [
          "min",
          "max",
          "avg"
        ]
      },
      {
        "field": "system.cpu.irq.norm.pct",
        "metrics": [
          "avg",
          "max",
          "min"
        ]
      },
      {
        "field": "system.cpu.irq.pct",
        "metrics": [
          "min",
          "max",
          "avg"
        ]
      },
      {
        "field": "system.cpu.nice.norm.pct",
        "metrics": [
          "avg",
          "max",
          "min"
        ]
      },
      {
        "field": "system.cpu.nice.pct",
        "metrics": [
          "min",
          "max",
          "avg"
        ]
      },
      {
        "field": "system.cpu.softirq.norm.pct",
        "metrics": [
          "avg",
          "max",
          "min"
        ]
      },
      {
        "field": "system.cpu.softirq.pct",
        "metrics": [
          "min",
          "max",
          "avg"
        ]
      },
      {
        "field": "system.cpu.steal.norm.pct",
        "metrics": [
          "avg",
          "max",
          "min"
        ]
      },
      {
        "field": "system.cpu.steal.pct",
        "metrics": [
          "min",
          "max",
          "avg"
        ]
      },
      {
        "field": "system.cpu.system.norm.pct",
        "metrics": [
          "avg",
          "max",
          "min"
        ]
      },
      {
        "field": "system.cpu.system.pct",
        "metrics": [
          "min",
          "max",
          "avg"
        ]
      },
      {
        "field": "system.cpu.total.norm.pct",
        "metrics": [
          "avg",
          "max",
          "min"
        ]
      },
      {
        "field": "system.cpu.total.pct",
        "metrics": [
          "min",
          "max",
          "avg"
        ]
      },
      {
        "field": "system.memory.actual.free",
        "metrics": [
          "avg",
          "max",
          "min"
        ]
      },
      {
        "field": "system.memory.actual.used.bytes",
        "metrics": [
          "min",
          "max",
          "avg"
        ]
      },
      {
        "field": "system.memory.actual.used.pct",
        "metrics": [
          "avg",
          "max",
          "min"
        ]
      },
      {
        "field": "system.memory.swap.free",
        "metrics": [
          "min",
          "max",
          "avg"
        ]
      },
      {
        "field": "system.memory.swap.total",
        "metrics": [
          "avg",
          "max",
          "min"
        ]
      },
      {
        "field": "system.memory.swap.used.bytes",
        "metrics": [
          "min",
          "max",
          "avg"
        ]
      },
      {
        "field": "system.memory.swap.used.pct",
        "metrics": [
          "avg",
          "max",
          "min"
        ]
      },
      {
        "field": "system.memory.total",
        "metrics": [
          "min",
          "max",
          "avg"
        ]
      },
      {
        "field": "system.memory.used.bytes",
        "metrics": [
          "avg",
          "min",
          "max"
        ]
      }
    ],
    "timeout": "20s",
    "page_size": 1000
  },
  "status": {
    "job_state": "started",
    "current_position": {
      "@timestamp.date_histogram": 1554464400000,
      "agent.hostname.terms": null,
      "beat.hostname.keyword.terms": "fps-idc03-k8-212-k8db",
      "host.hostname.terms": null,
      "metricset.module.keyword.terms": "system",
      "metricset.name.keyword.terms": "process",
      "system.filesystem.mount_point.keyword.terms": null,
      "system.process.name.keyword.terms": "lpfc_worker_1"
    },
    "upgraded_doc_id": true
  },
  "stats": {
    "pages_processed": 1,
    "documents_processed": 17336,
    "rollups_indexed": 1000,
    "trigger_count": 1,
    "index_time_in_ms": 323,
    "index_total": 1,
    "index_failures": 0,
    "search_time_in_ms": 33386,
    "search_total": 2,
    "search_failures": 1
  }
}

Starting hunting through the code but cant see the issue so far :frowning:

Hey there. This looks like a rollup issue, but it actually turns out to be a bug in the composite aggregation (which rollup uses). If the composite agg is iterating over a field in one index, then moves on to the next index in the search and finds that the field is unmapped in that new index... it accidentally mapped that field as numeric. This would throw an exception when the original field was a keyword like in your case, because the after key can't be cast to a numeric.

There's a bugfix to correct this behavior. It just merged and should be available in 7.1: https://github.com/elastic/elasticsearch/pull/41280

Since this is caused by unmapped fields, a temporary fix would be to go add the correct mapping to the indices missing the field(s), or just wait for 7.1 to land.

Sorry for the hassle!

1 Like

Hey,

Thank you very much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.