I did the migration again for the specific datastream (Logstash pipeline below - Fetch data from elasticsearch 7.16 and load it in 8.14.3):

Same result:

event.dataset does now contain a . instead of a -. Below a sample document:
All below fields are on the screenshot above as before (if you prefer it in a different format because of readability, please do request):

    "data_stream": {
      "namespace": "jboss",
      "type": "logs",
      "dataset": "info"
    "event.dataset": "jboss.fat"

I don't think we are facing timezone issues:

  "_index": ".ds-logs-info-jboss-2024.10.16-000001",
  "_id": "GbtLlJIBP4HuDLbvLht9",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@timestamp": "2024-09-28T22:00:05.852Z",

To avoid timezone issues I also started loading bigger chunks of consecutive days that even if there would be shifts in hours, there would still be enough data available (as you see below) - 28/9 to 1/10 and we are looking at 29/9:

Migration pipeline

Logstash file 1:

#Only pipeline size 500 & scroll 5m
#Other running pipeline size 200 & scroll 5m
input {
 elasticsearch {
    hosts => "localhost:9200"
    index => "jboss-fat-2024.09*"
    query => '{  }'
    size => 200
    scroll => "5m"
    docinfo => true

filter {
#Parse data via new logic (remove deducted fields)
      mutate {
        remove_field => [ "loglevel", "thread", "logtime", "class", "logmessage", "context" ]

#ID is generated below, old tags are removed first
      mutate {
        remove_tag => [ "idParsed", "idParsingFailed", "dateparsed", "idParsed" ]

#key is required for bug: https://github.com/logstash-plugins/logstash-filter-fingerprint/issues/46
    fingerprint {
      source => "message"
      target => "[@metadata][fingerprint]"
      method => "MD5"
      key => "XXX"
    ruby {
      code => "event.set('[@metadata][tsEpochMilliPrefix]', (1000*event.get('@timestamp').to_f).round(0))" 

    if [@metadata][tsEpochMilliPrefix] and [@metadata][fingerprint] {
        mutate {
#Document ID is set in the elasticsearch output plugin
#            add_field => { document_id => "%{[@metadata][tsEpochMilliPrefix]}%{[@metadata][fingerprint]}"}
            add_tag => [ "idParsed" ]
    } else {
        mutate {
            add_tag => [ "idParsingFailed" ]

output {
	if [fields][type] == "jboss" {
	  pipeline { send_to => "jboss-input" }
	} else if [fields][type] == "cassandra" {
	  pipeline { send_to => "cassandra-input" }
	} else if [fields][type] == "kpi" {
	  pipeline { send_to => kpi }
	} else if [fields][type] == "monitoring" {
	  pipeline { send_to => monitoring }

Logstash file 2:

input { pipeline { address => "jboss-input" } }

filter {
       grok {
          patterns_dir => ["/etc/logstash/patterns"]
          match => [ "message", "^%{TIMESTAMP_ISO8601:[log][time]}%{SPACE}%{SLOGLEVEL:[log][level]}%{SPACE}\[%{ENDCONTEXT:[log][context]}\]%{SPACE}\(%{NOTBRACKET:[log][thread]}\)%{SPACE}%{GREEDYDATA:[log][content]}$"] 
        mutate {
            convert => [ "pid", "integer"]
            remove_field => ["offset", "[prospector][type]"]
        date {
            match => [ "[log][time]" , "yyyy-MM-dd HH:mm:ss,SSS" ]
            timezone => "Europe/Brussels"
            add_tag => [ "dateparsed" ]
	#Required to have the source in Observability - Logs view
      mutate {
        add_field => { "event.dataset" => "%{[fields][type]}.%{[fields][env]}" }
		add_field => { "service.name" => "jboss" }
		add_field => { "host.hostname" => "%{[host][name]}" }
		add_field => { "container.id" => "jboss-%{[host][name]}" }
		add_field => { "log.file.path" => "%{[source]}" }
		#rename => { "[host][name]" => "[host][hostname]" }

I assume you do advise now to upgrade to the latest version. I'm still in development phase, so that would be perfectly feasible.