Illegal Argument Eeption

Opening my Kibana Dashboard on my standalone ELK implementation I get the referenced error, detailed ouput below:
{ "took": 906, "timed_out": false, "num_reduce_phases": 8, "_shards": { "total": 74, "successful": 73, "skipped": 44, "failed": 1, "failures": [ { "shard": 0, "index": "logstash-2020.08.17-000074", "node": "O7WSf1caR3Ghkd3Zdduotg", "reason": { "type": "illegal_argument_exception", "reason": "The length of [logmessage] field of [kSsQ_3MBD8KWFJ0iW3X0] doc of [logstash-2020.08.17-000074] index has exceeded [1000000] - maximum allowed to be analyzed for highlighting. This maximum can be set by changing the [index.highlight.max_analyzed_offset] index level setting. For large texts, indexing with offsets or term vectors is recommended!" } } ] }, "hits": { "total": 808479, "max_score": null, "hits": [] } }
I've seen the same error for the past two weeks, it appears the error is always with the latest shard (by date).

The log is telling you that the logmessage field is larger than 1000000 characters. This is explained more in the docs.

Looks like you have some pretty large log messages being ingested? Can you see how big that field is?

How would I go about determining what message is causing it? I can't display the log messages due to the error.

Try something like GET logstash-2020.08.17-000074/doc/kSsQ_3MBD8KWFJ0iW3X0

#! Deprecation: [types removal] Specifying types in document get requests is deprecated, use the /{index}/_doc/{id} endpoint instead.
{
"_index" : "logstash-2020.08.17-000074",
"_type" : "doc",
"_id" : "kSsQ_3MBD8KWFJ0iW3X0",
"found" : false
}

Ok, did you try that?

Yes, that was the output I received
Not sure I did it from the right place - Dev Tools Console?

Yes, but did you try the suggestion that the response gave you here?

I didn't b/c I didn't understand what it was suggesting, but after adding the leading '_' I get this:

{
"_index" : "logstash-2020.08.17-000074",
"_type" : "_doc",
"_id" : "kSsQ_3MBD8KWFJ0iW3X0",
"_version" : 1,
"_seq_no" : 3654,
"_primary_term" : 1,
"found" : true,
"_source" : {
"applicationlog" : "ics-orders-to-salesforce-service",
"eventtime" : "2020-08-17 20:55:17,512",
"host" : {
"hostname" : "emtmulc1p006",
"containerized" : false,
"id" : "76ba8ef5ccad46c387142217853051a1",
"name" : "emtmulc1p006",
"architecture" : "x86_64",
"os" : {
"name" : "SLES",
"kernel" : "4.4.180-94.116-default",
"version" : "12-SP3",
"family" : "suse",
"platform" : "sles"
}
},
"agent" : {
"type" : "filebeat",
"id" : "9fba55f5-1a50-4c5a-b910-7659f1369467",
"ephemeral_id" : "9b46f932-9b66-41a1-a5a7-383badb27ade",
"version" : "7.6.2",
"hostname" : "emtmulc1p006"
},
"input" : {
"type" : "log"
},
"@timestamp" : "2020-08-18T00:55:17.512Z",
"message" : """2020-08-17 20:55:17,512 INFO <<TRUNCATED, 4442 LINES of application specific data REDACTED>>
"loglevel" : "INFO",
"tags" : [
"beats_input_codec_plain_applied"
]
}
}

If this is you that has added the truncated/redacted info, and it is indeed 4422 lines, that's a pretty massive log file.

What sort of data is this?

That's ok, but you should let us know so we understand what you need help with :slight_smile:

Customer Order information. we see a couple of these (this size) a day usually.

Ok, so if you copy that field into a text editor, how many characters is it?

3,189,928

You could increase index.highlight.max_analyzed_offset to 4 million, as per the link in my first response. That'll remove the error, but it's likely to be pretty expensive, which will have larger impacts. As that doc suggests;

Plain highlighting for large texts may require substantial amount of time and memory

I'm not 100% sure what approach to take here. Is the error on the dashboard causing any problems, or can you just accept it?

Most of the time it causes Kibana to not display messages on the dashboard. For my App team, it's a deal breaker. I've considered 'The postings list', but I'm not sure of what all I'd need to change to accomodate.

ok, so If I gather correctly, what I want to do is modify the -template that is processing this data. Initially, I will open the flood gates, so to speak , and allow 5M v 1M, to do this I'm going to use the DEV Tool and:
PUT _template/logstash
{
"settings": {
"index": {
"highlight": {
"max_analyzed_offset": "5000000"
}
}
}
)
If this ounds and looks right to you, I'm going to give it a go.

That syntax looks ok.

it wasn't. any suggestions?

NVM> I figured it out. I t wants the expanded json set, not just whayt I wanted changed:

PUT _template/logstash
{
"order" : 0,
"version" : 60001,
"index_patterns" : [
"logstash-"
],
"settings": {
"index": {
"highlight": {
"max_analyzed_offset": "5000000"
},
"lifecycle" : {
"name" : "logstash-policy",
"rollover_alias" : "logstash"
},
"number_of_shards" : "1",
"refresh_interval" : "5s"
}
},
"mappings" : {
"_meta" : { },
"_source" : { },
"dynamic_templates" : [
{
"message_field" : {
"path_match" : "message",
"mapping" : {
"norms" : false,
"type" : "text"
},
"match_mapping_type" : "string"
}
},
{
"string_fields" : {
"mapping" : {
"norms" : false,
"type" : "text",
"fields" : {
"keyword" : {
"ignore_above" : 256,
"type" : "keyword"
}
}
},
"match_mapping_type" : "string",
"match" : "
"
}
}
],
"properties" : {
"@timestamp" : {
"type" : "date"
},
"geoip" : {
"dynamic" : true,
"type" : "object",
"properties" : {
"ip" : {
"type" : "ip"
},
"latitude" : {
"type" : "half_float"
},
"location" : {
"type" : "geo_point"
},
"longitude" : {
"type" : "half_float"
}
}
},
"@version" : {
"type" : "keyword"
}
}
},
"aliases" : { }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.