Logstash 404 errors when dealing with UPDATES after INSERTS


(Brian Gruber) #1

I have a logstash config like this:

input {
  file {
    path => ["/home/csdata/*.data"]
    codec => json {
      }
    start_position => "beginning"
    discover_interval => 5
 }
}
output{
if [_up] == 1 {
 elasticsearch {
        protocol => "http"
        host => "[myelasticsearchip]"
        cluster => "clustername"
        flush_size => 50
        index => "%{_index}"
        action => "update"
        document_id => "%{_id}"
        index_type => "%{_type}"
        }
}
else if [_id] != "" {
  elasticsearch {
        protocol => "http"
        host => "[myelasticsearchip]"
        cluster => "clustername"
        flush_size => 50
        index => "%{_index}"
        document_id => "%{_id}"
        index_type => "%{_type}"
        }
 }
else{
  elasticsearch {
       protocol => "http"
        host => "[myelasticsearchip]"
        cluster => "clustername"
        index => "%{_index}"
         flush_size => 50
        index_type => "%{_type}"
        }
    }
}

I have a ton of

failed action with response of 404, dropping action:

The data should all be coming into the same file and in order, so things should be created before they are being updated. This doesn't happen with ALL items, but with plenty. I would expect to have none of these errors.

Is this because of the different flush_sizes? Eventhough the items are in order in the original file, meaning an INSERT always comes before an UPDATE.

Any ideas would be greatly appreciated!


(Mark Walkom) #2

A 404 usually means it cannot find the document to update, maybe check your ES logs to see if there is something corresponding there as well.


(Brian Gruber) #3

Yes, it definitely can't find it. I'll even check directly in elasticsearch and the item isn't there. But eventually it gets there. I'm more confused about the ordering, because I thought if I put everything in one log file in order it should get to elasticsearch in that order. For example, if this is my log file (simplified):

Insert { 'id' : 1, 'content': 'whatever' }
Insert { 'id' : 2, 'content': 'whatever' }
Insert { 'id' : 3, 'content': 'whatever' }
Update { 'id' : 2, 'content': 'whatever is changed' }

Since, id: 2 came before I thought it would be inserted and guaranteed to be there for the Update, but it isn't always there. That's why I'm wondering if that has to do with having 2 outputs and different flush sizes. Do they then become independent of each other. And if so, what's the correct way to batch the updates and make sure the inserts have occurred.


(Mark Walkom) #4

It could be flush size, especially if you are working with changes that are closer together than 50 events.

I'm not really sure what can be done here but I'll give it some thought :slight_smile:


(Brian Gruber) #5

I wonder if being able to use upsert would help. If I just grouped both the inserts and updates into one update but used doc_as_upsert from this pull https://github.com/logstash-plugins/logstash-output-elasticsearch/pull/116

I can't figure out how to get that doc_as_upsert running in logstash though :frowning:


(system) #6