I need define an aditional pattern_definition for my pipeline. We have hostnames virtual host with an invalid character "_" than figure in apache log files.
Standard definition for grok filter pattern HOSTNAME is:
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.?|\b)
With a little modificacion I get a valid pattern for my hostnames. This is tested with https://grokdebug.herokuapp.com/ and it seems OK.
HOSTNAME_BAD \b(?:[0-9A-Za-z][0-9A-Za-z_-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.?|\b)
When I try save my new definition for pipeline.
PUT _ingest/pipeline/apache-combined_01
{
"description": "grok_apache_combined_01",
"processors": [
{
"grok": {
"field": "message",
"patterns": ["%{COMBINEDAPACHELOG} %{HOSTNAME:virtual_host} %{NUMBER:response_time}"],
"pattern_definitions" : {
"HOSTNAME_BAD" : "\b(?:[0-9A-Za-z][0-9A-Za-z_-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)"
}
}
},
{
"date": {
"field": "timestamp",
"formats": [ "dd/MMM/YYYY:HH:mm:ss Z" ]
}
},
{
"script": {
"lang": "painless",
"inline": "ctx.response_time_segs = Float.parseFloat(ctx.response_time) / params.microstosecs",
"params": {
"microstosecs": 1000000
}
}
}
]
}
I get an error:
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "Failed to parse content to map"
}
],
"type": "parse_exception",
"reason": "Failed to parse content to map",
"caused_by": {
"type": "i_o_exception",
"reason": "Unrecognized character escape '.' (code 46)\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@41e3e446; line: 9, column: 72]"
}
},
"status": 400
}
I can get a 'valid'? pattern without scaping ".":
"patterns": ["%{COMBINEDAPACHELOG} %{HOSTNAME_BAD:virtual_host} %{NUMBER:response_time}"],
"pattern_definitions" : {
"HOSTNAME_BAD" : "\b(?:[0-9A-Za-z][0-9A-Za-z_-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.?|\b)"
}
and I can save pipeline definition with this change but is not working for filter apache log.
Error in Filebeat is: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value:
I have test with double scape without success either.
"HOSTNAME_BAD" : "\b(?:[0-9A-Za-z][0-9A-Za-z_-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.\.?|\b)"
How can I solve that?
Is it possible to see the predefined patterns in Elasticsearch?