Hi @stephenb
Sorry for the dealy, I didn't find time to do the requested test yesterday.
Test
I did the migration again for the specific datastream (Logstash pipeline below - Fetch data from elasticsearch 7.16 and load it in 8.14.3):
Same result:
event.dataset
does now contain a .
instead of a -
. Below a sample document:
PS: a colleague of you used -
in this sample "Unknown" logs in observability overview - #2 by felixbarny. That's where I got it from.
All below fields are on the screenshot above as before (if you prefer it in a different format because of readability, please do request):
"data_stream": {
"namespace": "jboss",
"type": "logs",
"dataset": "info"
},
...
"event.dataset": "jboss.fat"
I don't think we are facing timezone issues:
{
"_index": ".ds-logs-info-jboss-2024.10.16-000001",
"_id": "GbtLlJIBP4HuDLbvLht9",
"_version": 1,
"_score": 0,
"_source": {
...
"@timestamp": "2024-09-28T22:00:05.852Z",
To avoid timezone issues I also started loading bigger chunks of consecutive days that even if there would be shifts in hours, there would still be enough data available (as you see below) - 28/9 to 1/10 and we are looking at 29/9:
Migration pipeline
Logstash file 1:
#Only pipeline size 500 & scroll 5m
#Other running pipeline size 200 & scroll 5m
input {
elasticsearch {
hosts => "localhost:9200"
index => "jboss-fat-2024.09*"
query => '{ }'
size => 200
scroll => "5m"
docinfo => true
}
}
filter {
#Parse data via new logic (remove deducted fields)
mutate {
remove_field => [ "loglevel", "thread", "logtime", "class", "logmessage", "context" ]
}
#ID is generated below, old tags are removed first
mutate {
remove_tag => [ "idParsed", "idParsingFailed", "dateparsed", "idParsed" ]
}
#key is required for bug: https://github.com/logstash-plugins/logstash-filter-fingerprint/issues/46
fingerprint {
source => "message"
target => "[@metadata][fingerprint]"
method => "MD5"
key => "XXX"
}
ruby {
code => "event.set('[@metadata][tsEpochMilliPrefix]', (1000*event.get('@timestamp').to_f).round(0))"
}
if [@metadata][tsEpochMilliPrefix] and [@metadata][fingerprint] {
mutate {
#Document ID is set in the elasticsearch output plugin
# add_field => { document_id => "%{[@metadata][tsEpochMilliPrefix]}%{[@metadata][fingerprint]}"}
add_tag => [ "idParsed" ]
}
} else {
mutate {
add_tag => [ "idParsingFailed" ]
}
}
}
output {
if [fields][type] == "jboss" {
pipeline { send_to => "jboss-input" }
} else if [fields][type] == "cassandra" {
pipeline { send_to => "cassandra-input" }
} else if [fields][type] == "kpi" {
pipeline { send_to => kpi }
} else if [fields][type] == "monitoring" {
pipeline { send_to => monitoring }
}
}
Logstash file 2:
input { pipeline { address => "jboss-input" } }
filter {
grok {
patterns_dir => ["/etc/logstash/patterns"]
match => [ "message", "^%{TIMESTAMP_ISO8601:[log][time]}%{SPACE}%{SLOGLEVEL:[log][level]}%{SPACE}\[%{ENDCONTEXT:[log][context]}\]%{SPACE}\(%{NOTBRACKET:[log][thread]}\)%{SPACE}%{GREEDYDATA:[log][content]}$"]
}
mutate {
convert => [ "pid", "integer"]
remove_field => ["offset", "[prospector][type]"]
}
date {
match => [ "[log][time]" , "yyyy-MM-dd HH:mm:ss,SSS" ]
timezone => "Europe/Brussels"
add_tag => [ "dateparsed" ]
}
#https://www.elastic.co/guide/en/observability/current/logs-app-fields.html
#https://discuss.elastic.co/t/log-source-unknown-in-observability-overview/262568
#Required to have the source in Observability - Logs view
mutate {
add_field => { "event.dataset" => "%{[fields][type]}.%{[fields][env]}" }
add_field => { "service.name" => "jboss" }
add_field => { "host.hostname" => "%{[host][name]}" }
add_field => { "container.id" => "jboss-%{[host][name]}" }
add_field => { "log.file.path" => "%{[source]}" }
#rename => { "[host][name]" => "[host][hostname]" }
}
}
I hope I answered all requests..
Best regards
Christof
PS:
I assume you do advise now to upgrade to the latest version. I'm still in development phase, so that would be perfectly feasible.