There isn't a settings side way. The only way to prevent duplication is to manage the _id in your ingesting application. Your point number 1 is valid but once you have multiple clients you can't trust you have bigger problems then _id management. I don't buy point number 2 or 3 because any application that can build the document in the first place has the data readily at hand to build the _id.
You can use the MD5 hash (or something equivalent) of the document as _id, this will prevent the dup. The way it works is when ES sees the same _id in the index, I think it replaces the current document in the index with the incoming document (kind of like an update to an existing document) Solr works the same way.
Note: some may argue that MD5 hash does have a potential collision but the probability is low. If you are not comfortable with the hash, then look at your document... if there is something from the document that you can use as a unique value that can be assigned to _id, then use that value.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.