Logstash tutorial: index_not_found_exception [SOLVED]

Hello,

I am following the Logstash tutorial. This command

curl -XGET '192.168.77.200:9200/logstash-2016.03.08/_search?q=response=200'

returns the following error, instead of the data shown in the tutorial:

{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","index":"logstash-2016.03.08","resource.type":"index_or_alias","resource.id":"logstash-2016.03.08"}],"type":"index_not_found_exception","reason":"no such index","index":"logstash-2016.03.08","resource.type":"index_or_alias","resource.id":"logstash-2016.03.08"},"status":404}

Elasticsearch is running correctly on the host 192.168.77.200. This is the content of first-pipeline.conf as indicated by the tutorial:

input {
    file {
        path => "/opt/logstash/logstash-tutorial.log"
        start_position => beginning
    }
}

filter {
    grok {
        match => { "message" => "%{COMBINEDAPACHELOG}"}
    }
    geoip {
        source => "clientip"
    }
}

output {
    elasticsearch {
        hosts => ["192.168.77.200:9200"]
    }
    stdout {}
}

and the Logstash process is run, as indicated, as

bin/logstash -f first-pipeline.conf

What is the reason for this error? Is it necessary to create an index beforehand?

Thanks in advance for any tip.

Logstash is probably ignoring logstash-tutorial.log. Maybe because it thinks is already has processed the file (in which case you can delete the sincedb file or set the file input's sincedb_path parameter to /dev/null) or you're running Logstash 2.2 and logstash-tutorial.log is older than 24 h (see the file input's ignore_older option).

1 Like

Thanks. In fact there were a few things to fix:

  • In the input section of the pipeline, we must add sincedb_path => "/dev/null" to tell Logstash to re-process the data, as you said.

  • The source data file must not be older than 24h, as you said, so (on Linux) a touch logstash-tutorial.log will fix the problem.

  • The filter section should include a filter to match the timestamp as extracted from the log, or otherwise Logstash will consider the timestamp of the file the current timestamp when it reads the file (quite useless):

      filter {
         grok {
            match => { "message" => "%{COMBINEDAPACHELOG}" }
         }
         date {
            match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
         }
         geoip {
            source => "clientip"
         }
      }
    
  • Another useful edit on the output section is to define the index name to something more meaningful:

      output {
         elasticsearch {
            hosts => ["192.168.77.200:9200"]
            index => "apachelogs-%{+YYYY.MM.dd}"     
         } stdout {}
      }

otherwise Logstash will consider the timestamp of the file (quite useless):

I'm pretty sure it defaults to the current timestamp when the file is read.

1 Like