Hi there,
We have a library that is sending access logs to logstash with a "similar" to Apache format. We have created the regexp in grok to parse it but I have detected that the library is using the default format, localised depending on the OS default locale... and it is not configurable (yep, planning to open an issue against the library as well).
With that, the problem is the month name, given that the regexp fails to parse month names in a different locale. Is there some way to get logstash to parse that by specifying the locale that is has to be used to parse the date?
For example, this is the part of the pipeline that fails:
grok {
match => { 'message' => '%{IPORHOST:client_ip} - %{DATA:userid} \[%{HTTPDATE:request.timestamp}\] "%{WORD:request.method} %{URIPATHPARAM:request.resource} HTTP/%{NUMBER:request.version}" %{NUMBER:response.status_code} %{NUMBER:response.size} %{NUMBER:service_time} "%{DATA:referer}" "%{DATA:user_agent}"' }
remove_field => [ "message" ]
}
And a sample line that the library produces is (Notice abr from Abril in Spanish):
x.x.x.x - - [19/abr/2023:10:43:19 +0000] "GET /whatever HTTP/1.1" 200 15 1 "-" "Agent"
whereas this one can be parsed (In this case, apr from April):
x.x.x.x - - [19/apr/2023:10:43:19 +0000] "GET /whatever HTTP/1.1" 200 15 1 "-" "Agent"
Thank you!
PD: If someone wants to pay a visit to whomever decided using a localizable format was a good idea for a default standard format (NCSA Common log format), I have a pitch and fork that I'd like to bring along .