How to re-ingest logs after upgrade to Elastic Stack 6.3.0

I've re-installed the ELK stack from scratch on a server that previously had a lower version. I though i had it all cleaned up prior to installing the latest version of ELK. When i go to the Kibana console i am not seeing where all the logs are being re-ingested again, only the newest logs are being parsed. How to force Logstash to reparse all logs again?

I am a newbie and need some assistance.

Thanks.

What are you using to ingest files, and how is it configured?

Sorry for the late response, I dont know? i only have Elasticsearch,Logstash and Kibana installed...do i need something else to re-read all the log files again?

Are you using a file input to ingest the files? If so, show the configuration. If not, what are you using?

Here you go...input {

file {

path => ["/var/log/adc/2018///adc.log",

"/var/log/adc/2018///asdi.log",

"/var/log/adc/2018///edct_cdm_flight_data.log",

"/var/log/adc/2018///flightaware.log",

"/var/log/adc/2018///flight_manager.log",

"/var/log/adc/2018///fp.log",

"/var/log/adc/2018///invalid_outgoing.log",

"/var/log/adc/2018///iridium.log",

"/var/log/adc/2018///met_error.log",

"/var/log/adc/2018///microservice.log",

"/var/log/adc/2018///mq_output.log",

"/var/log/adc/2018///performance.log",

"/var/log/adc/2018///position_data.log",

"/var/log/adc/2018///rmqapps.log",

"/var/log/adc/2018///sbbtraffic.log",

"/var/log/adc/2018///schneider.log",

"/var/log/adc/2018///skyguide_notams.log",

"/var/log/adc/2018///sql.log",

"/var/log/adc/2018///unparsed.log",

"/var/log/adc/2018///wx.log"

]

tags => [ "standard_adc_format" ]

default discover_interval is 15 sec

discover_interval => 60

file where indexes into the current log file positions are stored

sincedb_path => "/tmp/logstash-sincedb.db"

when a new log is first found, begin reading from the first line

start_position => "beginning"

}

file {

path => ["/var/log/adc/2018///api.log",

"/var/log/adc/2018///dashboard.log"

]

tags => [ "alt_adc_format" ]

default discover_interval is 15 sec

discover_interval => 60

file where indexes into the current log file positions are stored

sincedb_path => "/tmp/logstash-sincedb2.db"

when a new log is first found, begin reading from the first line

start_position => "beginning"

}

file {

path => ["/var/log/sys/2018///maillog"

]

tags => [ "syslog_format" ]

default discover_interval is 15 sec

discover_interval => 60

file where indexes into the current log file positions are stored

sincedb_path => "/tmp/logstash-sincedb3.db"

when a new log is first found, begin reading from the first line

start_position => "beginning"

}

}

filter {

if "standard_adc_format" in [tags] {

if ".py" in [message] {

it's a log line from a python app with extra info

grok {

match => [ "message", "^%{TIMESTAMP_ISO8601:logdate} <%{NOTSPACE:syslog}> %{NOTSPACE:hostname} %{NOTSPACE:appname}[%{USERNAME:process_id}] %{NOTSPACE:serverdate} %{NOTSPACE:servertime} %{WORD:loglevel} %{NUMBER:thread_id} %{NOTSPACE:source_file} %{POSINT:source_line} %{GREEDYDATA:message}" ]

overwrite => [ "message" ]

}

} else {

it's a standard syslog format not generated by our python logging libs

grok {

match => [ "message", "^%{TIMESTAMP_ISO8601:logdate} <%{NOTSPACE:syslog}> %{NOTSPACE:hostname} %{NOTSPACE:appname}[%{USERNAME:process_id}] %{GREEDYDATA:message}" ]

}

}

mutate {

gsub => [ "message", "", "

" ]

}

}

if "alt_adc_format" in [tags] {

grok {

match => [ "message", "^%{TIMESTAMP_ISO8601:logdate} <%{NOTSPACE:syslog}> %{NOTSPACE:hostname} #|%{NOTSPACE:date2} %{NOTSPACE:time2} %{WORD:loglevel} %{NUMBER:thread_id} %{NOTSPACE:source_file} %{POSINT:source_line} %{GREEDYDATA:message}" ]

overwrite => [ "message" ]

}

mutate {

gsub => [ "message", "", "

" ]

}

}

if "syslog_format" in [tags] {

grok {

match => [ "message", "^%{TIMESTAMP_ISO8601:logdate} <%{NOTSPACE:syslog}> %{NOTSPACE:hostname} %{NOTSPACE:appname} %{GREEDYDATA:message}" ]

overwrite => [ "message" ]

}

}

}

output {

if "_grokparsefailure" in [tags] {

write events that didn't match to a file

file { "path" => "/tmp/grok_failures.txt" }

} else {

elasticsearch { hosts => ["localhost:9200"] }

}

for debugging:

stdout { codec => rubydebug }

}

Can you edit your post, select the configuration, and click on </> in the toolbar above the composition window. It will make it a lot easier to read.

The problem is most likely the sincedb files (one per input).

sincedb_path => "/tmp/logstash-sincedb3.db"

These keep track of where in the file the input has ingested. If you blew away the Elastic stack but kept the sincedb files then none of the old data will get re-ingestd.

input {

  file {
      path => ["/var/log/adc/2018/*/*/adc.log",
               "/var/log/adc/2018/*/*/asdi.log",
               "/var/log/adc/2018/*/*/edct_cdm_flight_data.log",
               "/var/log/adc/2018/*/*/flightaware.log",
               "/var/log/adc/2018/*/*/flight_manager.log",
               "/var/log/adc/2018/*/*/fp.log",
               "/var/log/adc/2018/*/*/invalid_outgoing.log",
               "/var/log/adc/2018/*/*/iridium.log",
               "/var/log/adc/2018/*/*/met_error.log",
               "/var/log/adc/2018/*/*/microservice.log",
               "/var/log/adc/2018/*/*/mq_output.log",
               "/var/log/adc/2018/*/*/performance.log",
               "/var/log/adc/2018/*/*/position_data.log",
               "/var/log/adc/2018/*/*/rmqapps.log",
               "/var/log/adc/2018/*/*/sbbtraffic.log",
               "/var/log/adc/2018/*/*/schneider.log",
               "/var/log/adc/2018/*/*/skyguide_notams.log",
               "/var/log/adc/2018/*/*/sql.log",
               "/var/log/adc/2018/*/*/unparsed.log",
               "/var/log/adc/2018/*/*/wx.log"
              ]
      tags => [ "standard_adc_format" ]

      # default discover_interval is 15 sec
      discover_interval => 60

      # file where indexes into the current log file positions are stored
      sincedb_path => "/tmp/logstash-sincedb.db"

      # when a new log is first found, begin reading from the first line
      start_position => "beginning"
  }

  file {
      path => ["/var/log/adc/2018/*/*/api.log",
               "/var/log/adc/2018/*/*/dashboard.log"
              ]
      tags => [ "alt_adc_format" ]

      # default discover_interval is 15 sec
      discover_interval => 60

      # file where indexes into the current log file positions are stored
      sincedb_path => "/tmp/logstash-sincedb2.db"

      # when a new log is first found, begin reading from the first line
      start_position => "beginning"
  }

  file {
      path => ["/var/log/sys/2018/*/*/maillog"
              ]
      tags => [ "syslog_format" ]

      # default discover_interval is 15 sec
      discover_interval => 60

      # file where indexes into the current log file positions are stored
      sincedb_path => "/tmp/logstash-sincedb3.db"

      # when a new log is first found, begin reading from the first line
      start_position => "beginning"
  }
}

filter {

    if "standard_adc_format" in [tags] {
        if ".py" in [message] {
            # it's a log line from a python app with extra info
            grok {
                match => [ "message", "^%{TIMESTAMP_ISO8601:logdate} <%{NOTSPACE:syslog}> %{NOTSPACE:hostname} %{NOTSPACE:appname}\[%{USERNAME:process_id}\]  %{NOTSPACE:serverdate} %{NOTSPACE:servertime} %{WORD:loglevel} %{NUMBER:thread_id} %{NOTSPACE:source_file} %{POSINT:source_line} %{GREEDYDATA:message}" ]

                overwrite => [ "message" ]
            }
        } else {
            # it's a standard syslog format not generated by our python logging libs
            grok {
                match => [ "message", "^%{TIMESTAMP_ISO8601:logdate} <%{NOTSPACE:syslog}> %{NOTSPACE:hostname} %{NOTSPACE:appname}\[%{USERNAME:process_id}\] %{GREEDYDATA:message}" ]
            }
        }
        mutate  {
            gsub => [ "message", "<nl>", "
" ]
        }
    }

    if "alt_adc_format" in [tags] {
        grok {
            match => [ "message", "^%{TIMESTAMP_ISO8601:logdate} <%{NOTSPACE:syslog}> %{NOTSPACE:hostname} #\|%{NOTSPACE:date2}  %{NOTSPACE:time2} %{WORD:loglevel} %{NUMBER:thread_id} %{NOTSPACE:source_file} %{POSINT:source_line} %{GREEDYDATA:message}" ] 

            overwrite => [ "message" ]
        }
        mutate  {
            gsub => [ "message", "<nl>", "
" ]
        }
    }

    if "syslog_format" in [tags] {
        grok {
            match => [ "message", "^%{TIMESTAMP_ISO8601:logdate} <%{NOTSPACE:syslog}> %{NOTSPACE:hostname} %{NOTSPACE:appname} %{GREEDYDATA:message}" ]
            overwrite => [ "message" ]
        }
    }
}

output {
  if "_grokparsefailure" in [tags] {
       # write events that didn't match to a file
       file { "path" => "/tmp/grok_failures.txt" }
  } else {
     elasticsearch { hosts => ["localhost:9200"] }
  }
  # for debugging:
  # stdout { codec => rubydebug }
}

I reposted the configuration, not sure how to "Can you edit your post, select the configuration, and click on </> in the toolbar above the composition window. It will make it a lot easier to read."

Also the file you pointed out "sincedb_path => "/tmp/logstash-sincedb3.db" is no longer there.

Hello Badger,

I've been trying many recommendations in a variety of posts and still not able to re-ingest older files.

I'm hoping and waiting to see if you have any other ideas.

thanks!

Click on the pencil underneath your post. Click the left mouse button at the beginning of the text of the configuration. Scroll down to the end of the configuration. Shift click the left mouse button at the end of the configuration. Click on the </> icon above the edit window.

input {

  file {
      path => ["/var/log/adc/2018/*/*/adc.log",
               "/var/log/adc/2018/*/*/asdi.log",
               "/var/log/adc/2018/*/*/edct_cdm_flight_data.log",
               "/var/log/adc/2018/*/*/flightaware.log",
               "/var/log/adc/2018/*/*/flight_manager.log",
               "/var/log/adc/2018/*/*/fp.log",
               "/var/log/adc/2018/*/*/invalid_outgoing.log",
               "/var/log/adc/2018/*/*/iridium.log",
               "/var/log/adc/2018/*/*/met_error.log",
               "/var/log/adc/2018/*/*/microservice.log",
               "/var/log/adc/2018/*/*/mq_output.log",
               "/var/log/adc/2018/*/*/performance.log",
               "/var/log/adc/2018/*/*/position_data.log",
               "/var/log/adc/2018/*/*/rmqapps.log",
               "/var/log/adc/2018/*/*/sbbtraffic.log",
               "/var/log/adc/2018/*/*/schneider.log",
               "/var/log/adc/2018/*/*/skyguide_notams.log",
               "/var/log/adc/2018/*/*/sql.log",
               "/var/log/adc/2018/*/*/unparsed.log",
               "/var/log/adc/2018/*/*/wx.log"
              ]
      tags => [ "standard_adc_format" ]

      # default discover_interval is 15 sec
      discover_interval => 60

      # file where indexes into the current log file positions are stored
      # sincedb_path => "/tmp/logstash-sincedb.db"
      sincedb_path => "/dev/null"

      # when a new log is first found, begin reading from the first line
      start_position => "beginning"
  }

  file {
      path => ["/var/log/adc/2018/*/*/api.log",
               "/var/log/adc/2018/*/*/dashboard.log"
              ]
      tags => [ "alt_adc_format" ]

      # default discover_interval is 15 sec
      discover_interval => 60

      # file where indexes into the current log file positions are stored
      sincedb_path => "/dev/null"

      # when a new log is first found, begin reading from the first line
      start_position => "beginning"
  }

  file {
      path => ["/var/log/sys/2018/*/*/maillog"
              ]
      tags => [ "syslog_format" ]

      # default discover_interval is 15 sec
      discover_interval => 60

      # file where indexes into the current log file positions are stored
      #sincedb_path => "/tmp/logstash-sincedb3.db"
      sincedb_path => "/dev/null"

      # when a new log is first found, begin reading from the first line
      start_position => "beginning"
  }
}

filter {

    if "standard_adc_format" in [tags] {
        if ".py" in [message] {
            # it's a log line from a python app with extra info
            grok {
                match => [ "message", "^%{TIMESTAMP_ISO8601:logdate} <%{NOTSPACE:syslog}> %{NOTSPACE:hostname} %{NOTSPACE:appname}\[%{USERNAME:process_id}\]  %{NOTSPACE:serverdate} %{NOTSPACE:servertime} %{WORD:loglevel} %{NUMBER:thread_id} %{NOTSPACE:source_file} %{POSINT:source_line} %{GREEDYDATA:message}" ]

                overwrite => [ "message" ]
            }
        } else {
            # it's a standard syslog format not generated by our python logging libs
            grok {
                match => [ "message", "^%{TIMESTAMP_ISO8601:logdate} <%{NOTSPACE:syslog}> %{NOTSPACE:hostname} %{NOTSPACE:appname}\[%{USERNAME:process_id}\] %{GREEDYDATA:message}" ]
            }
        }
        mutate  {
            gsub => [ "message", "<nl>", "
" ]
        }
    }

    if "alt_adc_format" in [tags] {
        grok {
            match => [ "message", "^%{TIMESTAMP_ISO8601:logdate} <%{NOTSPACE:syslog}> %{NOTSPACE:hostname} #\|%{NOTSPACE:date2}  %{NOTSPACE:time2} %{WORD:loglevel} %{NUMBER:thread_id} %{NOTSPACE:source_file} %{POSINT:source_line} %{GREEDYDATA:message}" ]

            overwrite => [ "message" ]
        }
        mutate  {
            gsub => [ "message", "<nl>", "
" ]
        }
    }

    if "syslog_format" in [tags] {
        grok {
            match => [ "message", "^%{TIMESTAMP_ISO8601:logdate} <%{NOTSPACE:syslog}> %{NOTSPACE:hostname} %{NOTSPACE:appname} %{GREEDYDATA:message}" ]
            overwrite => [ "message" ]
        }
    }
}

output {
  if "_grokparsefailure" in [tags] {
       # write events that didn't match to a file
       file { "path" => "/tmp/grok_failures.txt" }
  } else {
     elasticsearch { hosts => ["localhost:9200"] }
  }
  # for debugging:
  # stdout { codec => rubydebug }
}

My latest input file looks like this now....I also touched every log file that i need to re-ingest. Deleted all the sincedb files...

Set

log.level: debug

in your logstash.yml. The output is very verbose. Check the expansions on the _globbed_files entries. Make sure it is finding the files you expect. Look for errors.

Will do

So, after looking at the logs directory..every 10 days the old logs are gzipped with the extension like so "sbbtraffic.log.gz". And, i guess the input file is not set to read these logs. Correct?

Can it be that easy?

Correct.

What would your recommended next steps be?

Do i need to ensure sincedb path be set in the input file instead of /dev/null?

What do you want to do? Do you want to ingest all those .gz files? Do you just want to ingest new files?

The files are huge and i was able to start ingesting 2017 and 2018 gz files, waiting on them to finish. May take the weekend to complete successfully.

I will update here when done and think about how to plan on ingesting new files.

Thanks for your help.

Reena

Good morning Badger,

So no success over the weekend. Discussed with my team and decided to start from scratch using the same server, cleaning up the ELK stack completely, re-installing ELK and re-ingest the logs by rsyncing from another server.

I want to make sure that there are no pieces of ELK remaining, any advise how to ensure this?

Thanks,

Reena