Morning all.
Im trying to use the elasticsearch plugin to pull data from an existing elasticsearch instance. Both are running older versions of Elasticsearch, and thus im running an older version of logstash (1.5). I've having 2 issues so far.
The logstash instance runs to a point, and then shuts down. (this isnt a major problem is problem 2 can be solved).
When I start up the logstash instance again, it copies over data which has already copied over, thus creating duplicate entries in my new elasticsearch instance.
If id, type and destination index of the documents are the same, by default it should not create another instance of the same document, but rather just bump the version number of the already indexed document.
Keep in mind though that you will potentially be flooded with 40x error responces from ES. Nothing to worry about since it's intended, but may take up space quickly depending on the amount of them.
If you look in the documentation the example given under the docinfo section, this seems to show how to assign document id from the metadata fields, which is the default location for this information.
I had set the docinfo => true, but didn't see any improvement (unless I need to dump the indexes first and start copying again).
@paz
I tried setting the default action to create_unless_exists earlier, but no joy. I'll try 'create' now. You're right about the logs though, flooding with 40x error responses. Am I right in thinking that logstash is trying to copy over data that it already has, and is erroring out because it already exists? This would explain why my document counts aren't increasing (yet). Eventually i'd expect to see logstash find an index that wasnt copied over and start increasing my document count.
That is correct, Elasticsearch refuses to create the document since it already exists, and the error propagates back to Logstash.
When the scroll goes beyond the documents already indexed, you should see those errors stopping and the document count increasing.
I deleted all my indexes and restarted logstash with the above output config, but im getting a constant stream of warns in the logs and no documents being indexed. :message=>"failed action with response of 400, dropping action
Ah, I wonder was it the index naming that was breaking it: {:timestamp=>"2017-06-01T13:35:22.377000+0000", :message=>"failed action with response of 400, dropping action: [\"index\", {:_id=>\"AVxUpH8GqqYqcknYGhjV\", :_index=>\"logstash-%{YYYY.MM.dd}\",\
logstash-%{YYYY.MM.dd} wasn't getting translated. I'll let it index away for now, and circle back here when I restart logstash. Hopefully it won't duplicate the data this time.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.