I'm planning on linking documents I write to ES to records in my relational database by simply taking the record id and writing it to the document id. This means the IDs for my documents will be serial (ie. 1,2,3,4,5) instead of random.
Any idea if this could cause long term performance issues? Perhaps having non random but serial document ids will cause documents to not be randomly distributed through the storage engine making reads more expensive.
@terell
If ids are serial I think distribution should not be an issue. But providing _id (serial or not) will degrade ingestion speed as number of documents in the index grow.
A meta-field _id is the document id. ES will ensure it's unique in an index. If you ingest two documents with same _id, the later one will overwrite previous one.
A field with a name "id" is an ordinary field. ES will allow any number of duplicates.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.