Metricbeat default index template causes "No matching token for number_type [BIG_INTEGER]"


#1

Hi there,
I've posted about this issue in the past and never received any suggestions. I finally got around to looking at it again and was able to figure out a little more this time, so I'm hoping someone will be able to help me understand what's causing the issue.

It seems to only occur with the system module and the process metricset. I'm using the default module config and the default template. I do have metricbeat shipping to logstash and then to elasticsearch, but logstash is doing no filtering. It is when logstash tries to send the doc to elasticsearch that I get the error:

    2018-09-12T18:39:30,230][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. 
   { :status=>400, :action=>["index", 
     {:_id=>"<host>-system-process-18.25.53", :_index=>"<host>-metricbeat-2018.09.12", :_type=>"doc", :_routing=>nil},
     #<LogStash::Event:0x62061ee>], :response=>{"index"=>
         {"_index"=>"<host>-metricbeat-2018.09.12", "_type"=>"doc", "_id"=>"<host>-system-process-18.25.53", "status"=>400, "error"=>
              {"type"=>"mapper_parsing_exception", "reason"=>"failed to parse", "caused_by"=>
                       {"type"=>"illegal_state_exception", "reason"=>"No matching token for number_type [BIG_INTEGER]"}
              }
         }
      }
  }

I'll post a sample doc below.


#2

"{
"process": {
"cwd": "/",
"memory": {
"size": 1452826624,
"share": 10043392,
"rss": {
"pct": 0.0435,
"bytes": 171515904
}
},
"cmdline": "python ",
"pgid": 28183,
"name": "python",
"cpu": {
"start_time": "2018-09-06T19:49:01.000Z",
"total": {
"pct": 0.018,
"value": 7302460,
"norm": {
"pct": 0.009
}
}
},
"pid": 28183,
"state": "sleeping",
"cgroup": {
"blkio": {
"path": "/docker/e657e814a24e3f4e0fd6a3b9d7aa472bb6ab34fe02eb1713395c12edf9c4ac76",
"total": {
"ios": 126,
"bytes": 4886528
},
"id": "e657e814a24e3f4e0fd6a3b9d7aa472bb6ab34fe02eb1713395c12edf9c4ac76"
},
"path": "/docker/e657e814a24e3f4e0fd6a3b9d7aa472bb6ab34fe02eb1713395c12edf9c4ac76",
"cpu": {
"path": "/docker/e657e814a24e3f4e0fd6a3b9d7aa472bb6ab34fe02eb1713395c12edf9c4ac76",
"cfs": {
"shares": 1024,
"period": {
"us": 100000
},
"quota": {
"us": 0
}
},
"rt": {
"runtime": {
"us": 0
},
"period": {
"us": 0
}
},
"id": "e657e814a24e3f4e0fd6a3b9d7aa472bb6ab34fe02eb1713395c12edf9c4ac76",
"stats": {
"periods": 0,
"throttled": {
"ns": 0,
"periods": 0
}
}
},
"id": "e657e814a24e3f4e0fd6a3b9d7aa472bb6ab34fe02eb1713395c12edf9c4ac76",
"memory": {
"path": "/docker/e657e814a24e3f4e0fd6a3b9d7aa472bb6ab34fe02eb1713395c12edf9c4ac76",
"mem": {
"limit": {
"bytes": 18446744073709552000
},
"failures": 0,
"usage": {
"max": {
"bytes": 1217224704
},
"bytes": 1069215744
}
},
"stats": {
"inactive_anon": {
"bytes": 40960
},
"cache": {
"bytes": 106225664
},
"rss_huge": {
"bytes": 448790528
},
"mapped_file": {
"bytes": 0
},
"swap": {
"bytes": 0
},
"unevictable": {
"bytes": 0
},
"pages_in": 20764330,
"active_anon": {
"bytes": 963096576
},
"hierarchical_memory_limit": {
"bytes": 18446744073709552000
},
"pages_out": 23720547,
"page_faults": 75011826,
"inactive_file": {
"bytes": 24784896
},
"hierarchical_memsw_limit": {
"bytes": 0
},
"rss": {
"bytes": 962990080
},
"major_page_faults": 79,
"active_file": {
"bytes": 81293312
}
},
"memsw": {
"limit": {
"bytes": 0
},
"failures": 0,
"usage": {
"max": {
"bytes": 0
},
"bytes": 0
}
},
"kmem_tcp": {
"limit": {
"bytes": 18446744073709552000
},
"failures": 0,
"usage": {
"max": {
"bytes": 0
},
"bytes": 0
}
},
"kmem": {
"limit": {
"bytes": 18446744073709552000
},
"failures": 0,
"usage": {
"max": {
"bytes": 0
},
"bytes": 0
}
},
"id": "e657e814a24e3f4e0fd6a3b9d7aa472bb6ab34fe02eb1713395c12edf9c4ac76"
},
"cpuacct": {
"percpu": {
"1": 15846032178493,
"2": 20804833010822
},
"path": "/docker/e657e814a24e3f4e0fd6a3b9d7aa472bb6ab34fe02eb1713395c12edf9c4ac76",
"total": {
"ns": 36650865189315
},
"id": "e657e814a24e3f4e0fd6a3b9d7aa472bb6ab34fe02eb1713395c12edf9c4ac76",
"stats": {
"user": {
"ns": 18788480000000
},
"system": {
"ns": 3041680000000
}
}
}
},
"fd": {
"limit": {
"hard": 1048576,
"soft": 524288
},
"open": 23
},
"username": "root",
"ppid": 28138
}
}"


#3

I have been able to narrow it down further to five fields under system.process.cgroup.memory: kmem, kmem_top, mem, memsw, and stats. All of these track values in bytes and the values are too big for the long type. So I assume I need to convert this template to use doubles, but it seems like this should not have been so difficult to discover, and IMO, perhaps the default template should just use doubles for bytes.


(Andrew Kroh) #4

Sounds like a similar problem as https://github.com/elastic/beats/issues/5854#issuecomment-359185591.


#5

It does seem very similar, although their error messages are a little more explicit.


(Andrew Kroh) #6

Yeah I noticed the errors were different. I wonder if something changed in Elasticsearch (like Jackson is now converting the value to a BigInteger) to cause a different error for the same problem.

But I think you can work-around the issue with the same (or similar) processors to drop the invalid values on the Metricbeat side. Long term I think this this one value 18446744073709552000 which means "no limit" needs to be handled by Metricbeat.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.