failed action with response of 404, dropping action:
The data should all be coming into the same file and in order, so things should be created before they are being updated. This doesn't happen with ALL items, but with plenty. I would expect to have none of these errors.
Is this because of the different flush_sizes? Eventhough the items are in order in the original file, meaning an INSERT always comes before an UPDATE.
Yes, it definitely can't find it. I'll even check directly in elasticsearch and the item isn't there. But eventually it gets there. I'm more confused about the ordering, because I thought if I put everything in one log file in order it should get to elasticsearch in that order. For example, if this is my log file (simplified):
Since, id: 2 came before I thought it would be inserted and guaranteed to be there for the Update, but it isn't always there. That's why I'm wondering if that has to do with having 2 outputs and different flush sizes. Do they then become independent of each other. And if so, what's the correct way to batch the updates and make sure the inserts have occurred.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.